Difference between revisions of "WBConfCall 2020.08.20-Agenda and Minutes"

From WormBaseWiki
Jump to navigationJump to search
 
(5 intermediate revisions by 2 users not shown)
Line 10: Line 10:
 
** Email on Monday Aug 17
 
** Email on Monday Aug 17
  
===Topic 1===
+
===Handling species lists===
 
* Handling species in WormBase and Alliance
 
* Handling species in WormBase and Alliance
 
** Want to dump "affected_by_pathogen" field for phenotypes listing pathogenic species
 
** Want to dump "affected_by_pathogen" field for phenotypes listing pathogenic species
Line 21: Line 21:
  
 
==Minutes==
 
==Minutes==
===Topic 1===
+
===WS279 Data Freeze===
* Info
+
* Michael WS279 dependent on when the 278 build finishes.  WS 278 hopefully done end of next week or soon after.
 +
 
 +
===Help Desk===
 +
* uncharacterized eukaryotic protein-coding gene with an intron
 +
** https://github.com/WormBase/website/issues/7829
 +
** need more information, maybe Cecilia should ask them.
 +
* Help with annotation file for ws235 version
 +
** Email on Monday Aug 17
 +
** Need something parsed from GFF. 
 +
** Raymond will ask who sent them the original file. 
 +
** Exon Rank is an ensembl thing, might be from ensembl-biomart, not WormBase.
 +
** How much formatting do we owe users, could give a GFF3 and let them figure it out.
 +
** On ws220 they may have used WormMart, which we don't have anymore.
 +
** "It looks like you used a service we don't provide anymore, we have the data in a different format, is that okay"
 +
 
 +
===Handling species lists===
 +
* Chris G wanted to dump pathogenic species, find out how they align with acedb species.  How do they get into Hinxton.
 +
* Genome driven and added to geneace as needed.
 +
* Sort out our data first.
 +
* Make sure every entry has a taxonomy ID to avoid redundant entries.
 +
* We have almost 8000 species in acedb, about 1900 don't have a taxon ID.  NCBI taxonomy tool can get 1400 taxon IDs and preferred names.  Some that don't map look lab-specific, or Caenorhabditis species, some typos.  Don't know if they're being used in annotations, but are zombie objects in acedb, probably auto-created. 455 of them.
 +
* Need to grep through ace files at each acedb source.  Probably not from bioblast nor homology source.  Papers could be a source for these.
 +
* Could create 2-way XREF everywhere a Species is used in the model.
 +
* Kimberly and Juancarlos could deal with Paper-based problems.
 +
* Molecule could also be an issue.
 +
* We should curate to taxonomy IDs instead of species names.
 +
* Do the grep first, maybe change model eventually.
 +
* Have to check every URL doesn't cause problems.

Latest revision as of 08:42, 3 September 2020

Agenda

WS279 Data Freeze

  • When?

Help Desk

Handling species lists

  • Handling species in WormBase and Alliance
    • Want to dump "affected_by_pathogen" field for phenotypes listing pathogenic species
    • Want/need to synchronize Postgres with Hinxton and ACEDB
    • Would like a primary species list
    • Should clean up existing species data, then possibly move to a centralized species tool at the Alliance
    • Source data: NCBITaxonomy
      • There is some churn on taxonomy ids, e.g. one taxonomy id is merged into another (although it appears to be relatively infrequent compared to other CVs or ontologies)
      • We will need a plan for tracking changes and updating our curation data as needed

Minutes

WS279 Data Freeze

  • Michael WS279 dependent on when the 278 build finishes. WS 278 hopefully done end of next week or soon after.

Help Desk

  • uncharacterized eukaryotic protein-coding gene with an intron
  • Help with annotation file for ws235 version
    • Email on Monday Aug 17
    • Need something parsed from GFF.
    • Raymond will ask who sent them the original file.
    • Exon Rank is an ensembl thing, might be from ensembl-biomart, not WormBase.
    • How much formatting do we owe users, could give a GFF3 and let them figure it out.
    • On ws220 they may have used WormMart, which we don't have anymore.
    • "It looks like you used a service we don't provide anymore, we have the data in a different format, is that okay"

Handling species lists

  • Chris G wanted to dump pathogenic species, find out how they align with acedb species. How do they get into Hinxton.
  • Genome driven and added to geneace as needed.
  • Sort out our data first.
  • Make sure every entry has a taxonomy ID to avoid redundant entries.
  • We have almost 8000 species in acedb, about 1900 don't have a taxon ID. NCBI taxonomy tool can get 1400 taxon IDs and preferred names. Some that don't map look lab-specific, or Caenorhabditis species, some typos. Don't know if they're being used in annotations, but are zombie objects in acedb, probably auto-created. 455 of them.
  • Need to grep through ace files at each acedb source. Probably not from bioblast nor homology source. Papers could be a source for these.
  • Could create 2-way XREF everywhere a Species is used in the model.
  • Kimberly and Juancarlos could deal with Paper-based problems.
  • Molecule could also be an issue.
  • We should curate to taxonomy IDs instead of species names.
  • Do the grep first, maybe change model eventually.
  • Have to check every URL doesn't cause problems.