WormBase-Caltech Weekly Calls
From WormBaseWikiJump to navigationJump to search
GoToMeeting link: https://www.gotomeet.me/wormbase1
September 12, 2019
Update on SVM pipeline
- New SVM pipeline: more analysis and more parameter tuning
- avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
- For example shown, "dumb" machine starts out with precision above 0.6
- G-value (Michael's invention); does not depend on distribution of sets
- Applied to various data types
- Analysis: 10-fold cross validation
- Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
- F-value changes over different p/n values; G-value does not (essentially flat)
- Area Under the Curve (AUC): probability that a random positive scores higher than random negative
- AUC values for many WB data types upper 80%'s into 90%'s
- Ranjana: How many papers for a good training set? Michael: we don't know yet
- Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
- If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
- Michael can provide training sets he has used recently
Clarifying definitions of "defective" and "deficient" for phenotypes
- WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
- Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
- What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
- Definitions include meanings or words:
- "Variations in the ability"
- "perturbation that disrupts"
- Failure to execute the characteristic response = abnormal?
- abnormality leading to specific outcomes
- fail to exhibit the same taxis behavior = abnormal?
- failure OR delayed
- failure, slower OR late
- Tuesday, Sep 24th
Strain to ID mapping
- Waiting on Hinxton to send strain ID mapping file?
- Hopefully we can all get that well before the upload deadline
- Will do global replacement at time of citace upload (at least for now)
New name server
- When will this officially go live?
- Will we now be able to request strain IDs through the server? Yes
- New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
- A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
- Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
- Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology