Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
Line 121: Line 121:
 
* How do we handle cases where one peptide maps to multiple genes?
 
* How do we handle cases where one peptide maps to multiple genes?
 
* How should large scale expression data be incorporated into enrichment analysis?
 
* How should large scale expression data be incorporated into enrichment analysis?
 +
 +
=== GitHub tracker for Caltech curation issues ===
 +
* "Caltech curation" repository - primarily used for creating curation tool set for Caltech team
 +
* "WormBase curation" repository - last used in 2015
 +
* Should there be a single repository for tracking? Yes, use the "wormbase-curation" repository
 +
* Curators will start submitting tickets in the "wormbase-curation" repository

Revision as of 19:51, 14 December 2017

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

January

February

March

April

May

June

July

August

September

October

November


December 7th, 2017

Next Citace upload

  • Submit files to Wen by Jan 16, 2018
  • There shouldn't be model changes (that affect Caltech) for this next upload
  • Juancarlos goes on vacation Dec 19th

WB Curator candidate

  • April, could join WB for 1-2 years
  • Could possibly interview next Thursday
  • Could curate sequence features, enhancers, alleles/mutations/variations?
  • Could start in January

Expression clusters

  • Wen and Raymond discussed; will add an "uncertain" tag or something similar
  • Tag can be used to filter out datasets that are complex (like mixed cell/tissue types)

Automated descriptions & expression clusters

  • Many expression clusters are associated with infection
  • Such data not currently incorporated into automated gene descriptions
  • Wen can talk with Ranjana to incorporate (particularly for info-sparse genes)
  • Can use GO terms, but need to be careful about how they are applied (what relationships they have) to genes

Author First Pass Form

Overall approach

  • AFP form would move from being just a flagging pipeline towards a validation and data entry pipeline
  • Also move from free text boxes towards autocompletes, controlled vocabularies
  • If this is a validation and a portal for data entry forms, should we display everything we can mine?
  • Idea is to eventually create data entry forms for as many data types as possible
  • Long term goal would include linking to TPC and evidence sentences for validation
  • Data types not curated by WB would need to be re-evaluated: continue to mine but share with other curation groups? no longer mine? new WB objects or pages?
  • Periodically review the form and add/remove data types and curation forms as needed
  • Would be good to ask authors if they had to omit data from the recently published paper and if they want to micropublish

Questions

  • Question for WB curators: Are we missing data types?
  • Question for WB curators: Are there data types that we don't curate (e.g. domain analysis) but for which we could share mining information with other groups, e.g. UniProt?
  • Question for WB curators: Are there data types we should no longer mine?
  • Question for Wen: Does it make sense to include an afp flag for expression cluster?
  • Question for Karen: What afp tables are populated from the Genetics/G3 pipeline?
Details here: http://wiki.wormbase.org/index.php/Genetics_Markup_by_Textpresso_and_First_Pass
Screen Shot 2017-12-07 at 9.28.56 AM.png

Thoughts

  • Could we add to citation index score as incentive to provide validation/data?
  • How do we make sure we are not turning off/overwhelming users/participants/authors?
  • Can gamify the forms; give points?
  • Create an app that has user-friendly data submission tools/forms?


December 14th, 2017

MOD data in Data Commons

  • NIH Data Commons wants DB data in common formats
  • We've wanted APIs or web service to allow access all data in standard format
  • Good to have one place for all data from all MODs
  • This should be a work product of the AGR
  • Consider priority: data site, website
  • AGR working groups work towards data standards
  • GAF (gene association file) could be used now; what can we learn about the GAF: development, usage?
  • GAF hasn't changed much in last ~10 years

Contingency plan for Juancarlos' vacation

  • If Tazendra crashes/dies, we can perform a hard drive swap at Caltech

Essential genes

  • David Fay asked for list of essential C. elegans genes
  • David asked for genes whose mutants are inviable as homozygotes, ignoring RNAi
  • Could we generate a file for users? Can we make a conservative list with ~no false positives
  • What are all the criteria (that are curated) that could be used to decide whether a gene is essential?
  • How does any user define "essential"? What are the cutoffs/thresholds?
  • Best option may be to provide a table with all relevant phenotype annotation attributes and let user decide based on filters

Expression certainty

  • Proteomics experiments have subjective thresholds
  • How do we handle cases where one peptide maps to multiple genes?
  • How should large scale expression data be incorporated into enrichment analysis?

GitHub tracker for Caltech curation issues

  • "Caltech curation" repository - primarily used for creating curation tool set for Caltech team
  • "WormBase curation" repository - last used in 2015
  • Should there be a single repository for tracking? Yes, use the "wormbase-curation" repository
  • Curators will start submitting tickets in the "wormbase-curation" repository