WormBase-Caltech Weekly Calls September 2019

September 12, 2019

Update on SVM pipeline

New SVM pipeline: more analysis and more parameter tuning
avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
For example shown, "dumb" machine starts out with precision above 0.6
G-value (Michael's invention); does not depend on distribution of sets
Applied to various data types
Analysis: 10-fold cross validation
- Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
F-value changes over different p/n values; G-value does not (essentially flat)
Area Under the Curve (AUC): probability that a random positive scores higher than random negative
AUC values for many WB data types upper 80%'s into 90%'s
Ranjana: How many papers for a good training set? Michael: we don't know yet
Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
Definitions include meanings or words:
- "Variations in the ability"
- "aberrant"
- "defect"
- "defective"
- "defects"
- "deficiency"
- "deficient"
- "disrupted"
- "impaired"
- "incompetent"
- "ineffective"
- "perturbation that disrupts"
- Failure to execute the characteristic response = abnormal?
- abnormal
- abnormality leading to specific outcomes
- fail to exhibit the same taxis behavior = abnormal?
- failure
- failure OR delayed
- failure, slower OR late
- failure/abnormal
- reduced
- slower

Citace upload

- Tuesday, Sep 24th

Strain to ID mapping

Waiting on Hinxton to send strain ID mapping file?
Hopefully we can all get that well before the upload deadline
Will do global replacement at time of citace upload (at least for now)

New name server

When will this officially go live?
Will we now be able to request strain IDs through the server? Yes

SObA Graphs

New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
- Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology

September 19, 2019

Strains

Need to wait for new strain IDs from Hinxton before running dumping scripts
Don't edit multi-ontology strain fields in OA for now!
Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
"Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

Working group will be formed soon
Will work out general common pipelines for literature curation

SObA Graph relations

Currently only integrating over "is a", "part of" and "regulates"
Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

Putting together paper for AFP
Reviewing all user input for paper
Asking individual curators to check input

September 26, 2019

Data mining

Someone in Paul's lab asking to retrieve list of C. elegans orthologs from a list of human genes
Could we build a (simple) Alliance tool to do this?
Could SimpleMine do this? Could we build a SimpleMine-like tool for Alliance?

Strains

Paul D generated WBStrains for the missing TransgeneOme objects
Working on a pipeline to identify new TransgeneOme strains at each upload
One TransgeneOme object had 2 strains. Possible solutions: dump 2 expression objects that differ only in the Strain or remove the UNIQUE tag in the data model
- Probably best to keep UNIQUE tag
Raymond: concerned about automatically generating strains based on imports from the group
Many odd strain names are coming from the TransgeneOme group; maybe we ought to have more discussions about generating official (following nomenclature standards) strain names from their imports
Quarantine strains on initial import; review and accept if pass standards

Community phenotype requests August 2019

Sent out new round of phenotype requests on August 20, 21, and 22, 2019
2,626 emails/papers requested
114 emails bounced; 5 resent to new addresses
460 Phenotype OA community annotations; 181 RNAi OA annotations (641 annotations total)
From 94 papers (83 for Phenotype OA; 33 for RNAi; 22 for both)
By 81 distinct community curators (70 for Phenotype OA; 32 for RNAi OA; 21 for both)
50 papers flagged as not having phenotypes (40 papers DO have phenotypes; 10 marked as negative; 80% failure rate!)
- Email states: "If there are no nematode phenotypes in this paper click the following link :"
- Maybe people are confused, or want to blow off the request
- Maybe we can programmatically generate short URLs for the link? May be difficult
- Provide a link to correct mistakes on confirmation page
4 papers flagged for phenotypes (only 2 had curatable phenotypes; 1 had honey-induced phenotypes)
115 papers with responses (5% response); 24 papers with input that were not main focus of request
Can we provide an opt-out link?

Comparison SObA

Actually quite complicated; may require more consideration

SObA graph and Ontology Browser for papers

May be able to modify/hack existing tools for genes and apply to papers
Paper-term matching powered by Textpresso

WormBase-Caltech Weekly Calls September 2019

Contents

September 12, 2019

Update on SVM pipeline

Clarifying definitions of "defective" and "deficient" for phenotypes

Citace upload

Strain to ID mapping

New name server

SObA Graphs

September 19, 2019

Strains

Alliance literature curation

SObA Graph relations

Author First Pass

September 26, 2019

Data mining

Strains

Community phenotype requests August 2019

Comparison SObA

SObA graph and Ontology Browser for papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools