Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
Line 154: Line 154:
 
* 50 papers flagged as not having phenotypes (40 papers DO have phenotypes; 10 marked as negative; 80% failure rate!)
 
* 50 papers flagged as not having phenotypes (40 papers DO have phenotypes; 10 marked as negative; 80% failure rate!)
 
** Email states: "If there are no nematode phenotypes in this paper click the following link :"
 
** Email states: "If there are no nematode phenotypes in this paper click the following link :"
 +
** Maybe people are confused, or want to blow off the request
 +
** Maybe we can programmatically generate short URLs for the link? May be difficult
 
* 4 papers flagged for phenotypes (only 2 had curatable phenotypes; 1 had honey-induced phenotypes)
 
* 4 papers flagged for phenotypes (only 2 had curatable phenotypes; 1 had honey-induced phenotypes)
 
* 115 papers with responses (5% response); 24 papers with input that were not main focus of request
 
* 115 papers with responses (5% response); 24 papers with input that were not main focus of request

Revision as of 18:38, 26 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input


September 26, 2019

Data mining

  • Someone in Paul's lab asking to retrieve list of C. elegans orthologs from a list of human genes
  • Could we build a (simple) Alliance tool to do this?
  • Could SimpleMine do this? Could we build a SimpleMine-like tool for Alliance?

Strains

  • Paul D generated WBStrains for the missing TransgeneOme objects
  • Working on a pipeline to identify new TransgeneOme strains at each upload
  • One TransgeneOme object had 2 strains. Possible solutions: dump 2 expression objects that differ only in the Strain or remove the UNIQUE tag in the data model
    • Probably best to keep UNIQUE tag
  • Raymond: concerned about automatically generating strains based on imports from the group
  • Many odd strain names are coming from the TransgeneOme group; maybe we ought to have more discussions about generating official (following nomenclature standards) strain names from their imports
  • Quarantine strains on initial import; review and accept if pass standards

Community phenotype requests August 2019

  • Sent out new round of phenotype requests on August 20, 21, and 22, 2019
  • 2,626 emails/papers requested
  • 114 emails bounced; 5 resent to new addresses
  • 460 Phenotype OA community annotations; 181 RNAi OA annotations (641 annotations total)
  • From 94 papers (83 for Phenotype OA; 33 for RNAi; 22 for both)
  • By 81 distinct community curators (70 for Phenotype OA; 32 for RNAi OA; 21 for both)
  • 50 papers flagged as not having phenotypes (40 papers DO have phenotypes; 10 marked as negative; 80% failure rate!)
    • Email states: "If there are no nematode phenotypes in this paper click the following link :"
    • Maybe people are confused, or want to blow off the request
    • Maybe we can programmatically generate short URLs for the link? May be difficult
  • 4 papers flagged for phenotypes (only 2 had curatable phenotypes; 1 had honey-induced phenotypes)
  • 115 papers with responses (5% response); 24 papers with input that were not main focus of request