WBConfCall 2020.08.20-Agenda and Minutes
From WormBaseWiki
Jump to navigationJump to searchContents
Agenda
WS279 Data Freeze
- When?
Help Desk
- uncharacterized eukaryotic protein-coding gene with an intron
- Help with annotation file for ws235 version
- Email on Monday Aug 17
Handling species lists
- Handling species in WormBase and Alliance
- Want to dump "affected_by_pathogen" field for phenotypes listing pathogenic species
- Want/need to synchronize Postgres with Hinxton and ACEDB
- Would like a primary species list
- Should clean up existing species data, then possibly move to a centralized species tool at the Alliance
- Source data: NCBITaxonomy
- There is some churn on taxonomy ids, e.g. one taxonomy id is merged into another (although it appears to be relatively infrequent compared to other CVs or ontologies)
- We will need a plan for tracking changes and updating our curation data as needed
Minutes
WS279 Data Freeze
- Michael WS279 dependent on when the 278 build finishes. WS 278 hopefully done end of next week or soon after.
Help Desk
- uncharacterized eukaryotic protein-coding gene with an intron
- https://github.com/WormBase/website/issues/7829
- need more information, maybe Cecilia should ask them.
- Help with annotation file for ws235 version
- Email on Monday Aug 17
- Need something parsed from GFF.
- Raymond will ask who sent them the original file.
- Exon Rank is an ensembl thing, might be from ensembl-biomart, not WormBase.
- How much formatting do we owe users, could give a GFF3 and let them figure it out.
- On ws220 they may have used WormMart, which we don't have anymore.
- "It looks like you used a service we don't provide anymore, we have the data in a different format, is that okay"
Handling species lists
- Chris G wanted to dump pathogenic species, find out how they align with acedb species. How do they get into Hinxton.
- Genome driven and added to geneace as needed.
- Sort out our data first.
- Make sure every entry has a taxonomy ID to avoid redundant entries.
- We have almost 8000 species in acedb, about 1900 don't have a taxon ID. NCBI taxonomy tool can get 1400 taxon IDs and preferred names. Some that don't map look lab-specific, or Caenorhabditis species, some typos. Don't know if they're being used in annotations, but are zombie objects in acedb, probably auto-created. 455 of them.
- Need to grep through ace files at each acedb source. Probably not from bioblast nor homology source. Papers could be a source for these.
- Could create 2-way XREF everywhere a Species is used in the model.
- Kimberly and Juancarlos could deal with Paper-based problems.
- Molecule could also be an issue.
- We should curate to taxonomy IDs instead of species names.
- Do the grep first, maybe change model eventually.
- Have to check every URL doesn't cause problems.