WBConfCall 2023.09.21–Agenda and Minutes

From WormBaseWiki
Jump to navigationJump to search

Agenda

Help Desk

Sequence feature curation

  • Gary Williams was curating sequence features for expression annotations, for example enhancer regions sufficient to drive expression in a certain tissue.
  • We are not actively curating this at the moment but there are a couple of annotations that need Sequence Feature objects -found while cleaning up data for Alliance.
  • In the short term there would be need to curate only Sequence Feature objects for these 2 existing papers, but to do so would need access to the F map?
  • Example of a sequence feature connected to expression: https://wormbase.org/species/all/expr_pattern/Expr11278#02--10
  • https://github.com/WormBase/website/issues/9328

ftp site synchronization

  • GO is waiting for the WB GAF to complete their September release
  • Are there still problems with synchronizing the Hinxton and WB ftp site?
    • If so, can we temporarily point GO to the Hinxton ftp site or will they have the same problem?
  • https://github.com/WormBase/website/issues/9319

Caltech-Nameserver options

  • How should Variation and Strains be created by Caltech curations, what options are possible.

Minutes

Sequence Feature Curation

  • Daniela - Cannot do full seq feat curation, but could curate a few here and there, but how to access tools for it.
  • Stavros - Enhancement curated in geneaces. Can look at documentation. Gary left a couple of guides. It will take some time, try before next build starts.
  • Daniela - Not urgent, for transfer of expression data at Alliance to persistent store. Only objects associated to genes for now. Have some months.* Stavros - github ticket helps keep it in mind. Maybe for 292.

ftp site synchronization

  • Stavros - ftp error at EBI probably from ddos attack, admins have it working again.
  • Paulo - Tried to sync symlinks, but most of OICR is owned by Todd (not at meeting), so not sure it's synced properly, waiting to hear from him. Could manually sync specific files Kimberly needs.
  • Stavros - http and https should work.
  • Paulo - Sync is all through perl, but not working well right now. Can see Aug 31st files from last sync. Will take a look.
  • Later Kimberly and Todd joined the call, but Todd had to leave, so will look into this later on.

Caltech-Nameserver options

  • Hoped to talk with Caltech curators and Manuel about Caltech-Nameserver, but people are at Machine Learning in Biocuration workshop.

What can we do about transferring things to Alliance taking so long.

  • Paul S - Alliance won't have a home for most of the data for a while. Should we keep curation going or save energy from the build.
  • Todd - Can't have a partial build.
  • Paul - How much data do we need before we can stop doing the build.
    • Not much RNAi data, now more crispr.
    • In the context of Genetics article and for planning, go through all datatypes, figure out what's missing in Alliance.
    • Some stuff won't get organized, like Cengen, it's about UI switch and will handle separately. Will take work, but not need A-team.
    • Go through everything class by class and see what's worth doing. Find some temporary solutions if necessary.
  • Todd - How much is getting curated per dataclass, if not active... (maybe don't worry about it ?)
    • Users can have separate Alliance and WB browser tabs open.
    • Gene name changes are important.
    • Make changes available from Caltech.
  • Paul - Point to users about what's missing at Alliance.
  • Todd - Invert user flow, instead of starting at WB, start at Alliance.
  • Paul - Curation, figure out what's active. We'd like to curate this, but instead do GO-Cam or something else. Until the Alliance persistent store can handle curation. We have more curation that needs doing than we can do.
  • Sarah - What's the granularity of what users need more, prioritize based on user use.
  • Todd - Could do that, but not that easy to figure out, e.g. the gene page has lots of data and don't know what users are looking at. Could figure things out from publication trends.
  • Raymond - Also biocuration automation.
  • Sarah - With Magda would look at datatypes and figure out what would or wouldn't have a home. Apollo coming in as we move away from ace needs discussion and defining.
  • Paul - Discussion about sequence and Apollo.
  • Sarah - Apollo has a lot of backend and not much front end, so not very usable. Nothing before 6 (?) months.
  • Stavros - Some developer changes at Apollo.
  • Paul - If not for 1.5 years, is it close enough to freeze what we have until then.
    • Need Apollo to handle other species.
    • What do we want in 2 years.
    • Plan that out, then figure out timeline and transition.
  • Stavros - Feb 2024 was timeline, but probably won't happen.
  • Paul - A 6 month gap is okay. Good Stavros is helping Apollo. If it's good in 1.5 years, how will that look.
  • Sarah - Still need to figure out how Apollo will communicate with Alliance.
  • Paul - Information stored in Apollo backend.
  • Sarah - Need to revisit inventory as Apollo develops.
  • Stavros - Could know more if Alliance gene and gene features wasn't on hold. No one else willing to lead the group.
  • Paul - Stavros, Gil, Scott. What do we need to move it forward.
  • Scott - needs sequence curators, but he can handle manipulating existing sequence data.
  • Stavros - A bunch of Alliance names, Kalpana was one of them.
  • Paul - WB needs it more than anyone else, will have to lead it. Will ask people for advice, figure things out over next couple of months.
  • Sarah - A couple of years on WB grant for migration. What doesn't exist at Alliance that needs to exist.
  • Paul - WB already funded, it's about Alliance.
  • Sarah - Generic integration with Apollo in the Alliance space. Get letter of support.
  • Paul - Letter of support for creating dependencies, we can only have 10 letters of support.
    • Ensembl can integrate information with other stuff. Need to spell out what we can do beyond GFF files.
  • Sarah - Don't do richness of other datatypes, that's what comes from WB and other mods.
    • What's happening with generic comparator species.
  • Paul - What do we want from Ensembl genome. A set of features.
  • Sarah - Homology, metozoa (?) space, not just worms more broads.
  • Paul - Get orthology from ncbi, panthr, other stuff.
    • What's the status of regulator elements.
  • Sarah - Human, mouse is okay, beyond that, no.
    • Have expression atlas, connect to uniprot.
  • Paul - Think about use cases, models, tier 2 species, what gives value.
    • Big picture is: spell out things, there's a way forward, no need to panic.
    • Decrease curation of some stuff unless it goes straight into Alliance.
  • Sarah - A lot of Stavros's time is running the build, could have more curation. Maybe every 6 months.
  • Paul - Little advantage in more frequent builds.
  • Todd - 12 month cycle.
  • Paul - 9 month cycle, only on stage for 3 months.
  • Paul or Todd - Can generate flat file from Caltech with new data.
  • Juancarlos - We can generate .ace and diff with other .ace and have a flatfile of things that changed, but I don't know how that would get represented on which website. Curators would need to give input.
  • Paul - What would Stavros like.
  • Stavros - Do less builds. Wait for Apollo and do annotations there.
  • Todd - Build is onerous, a lot of moving parts, a lot of centres.
    • Talked about moving out of Hinxton, historical stuff we can't scrape away.
    • Went from 1 week build to 3 months.
    • Quarterly is fine, 6 months would be fine. No preference one way or another.
  • Stavros - Gene names change, and get help desk questions about when that will go live.
  • Juancarlos - Can we get gene names from nameserver directly
  • Stavros - Doesn't have a lot of curation information
  • Paul - Key is gene names, is that nameserver
  • Stavros - yes, but doesn't have enough data. Could maybe be possible for gene names, person, lab, papers. Only need computation for orthologs and other things.
  • Juancarlos - Could build API for person, lab, papers for main website to get data live from there, so data is up to date, but wouldn't be part of search, just have most recent data.
  • Raymond - For a full build it's difficult to do diffs, something Kevin said a while back about updates and connections to processes.
    • Moving to LinkML modeling. Working with Alliance on harmonization is rate limiting.
  • Paul - Alliance has ID minting.
  • Mark QT - Only generates AGRKB doesn't track names or history.
  • Stavros - Is Caltech postgres a view of acedb schema
  • Juancarlos - Denormalized view of how curators look at things for ease of curation and generate .ace files.
  • Stavros - acedb harder to maintain, related to WB crashes ?
  • Todd - Keeps him up at night too.
  • Paul - Move on from acedb
  • Raymond - Valerio dockerized it.
  • Todd - acedb crashes within docker.
  • Adam - There's a script that checks something because it doesn't restart itself properly.
  • Paul - Keep thinking about it, raised the right questions.
  • Stavros - Apollo is going to run with MongoDB as backend.
  • Juancarlos - Build API for paper, person, lab ?
  • Todd - Not critical yet
  • Stavros - Apollo can track paper-gene connections, could expand to handle persons and other stuff.
  • Kimberly - Could use some discussion.
  • Stavros - Can present in a month.
  • Paul - Thanks for having this discussion.