Revision as of 18:14, 12 September 2019

Previous Years

New SVM pipeline: more analysis and more parameter tuning
avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
"dumb" machine starts out with precision above 0.6
G-value (Michael's invention); does not depend on distribution of sets
Applied to various data types
Analysis: 10-fold cross validation
- Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
F-value changes over different p/n values; G-value does not (essentially flat)
Area Under the Curve (AUC): probability that a random positive scores higher than random negative
AUC values for many WB data types upper 80%'s into 90%'s
Ranjana: How many papers for a good training set? Michael: we don't know yet
Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
Michael can provide training sets he has used recently

WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
Definitions include meanings or words:
- "aberrant"
- "defective"
- "defect"
- "defects"
- "deficiency"
- "disrupted"
- "ineffective"
- "perturbation that disrupts"
- "variations in the ability"
- failure to execute the characteristic response = abnormal?
- abnormal
- abnormality leading to specific outcomes
- fail to exhibit the same taxis behavior = abnormal?
- failure
- failure OR delayed
- failure/abnormal

@@ Line 54: / Line 54: @@
 * F-value changes over different p/n values; G-value does not (essentially flat)
 * Area Under the Curve (AUC): probability that a random positive scores higher than random negative
-* AUC values for many data types upper 80%'s into 90%'s
+* AUC values for many WB data types upper 80%'s into 90%'s
+* Ranjana: How many papers for a good training set? Michael: we don't know yet
+* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
+* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
+* Michael can provide training sets he has used recently
 === Clarifying definitions of "defective" and "deficient" for phenotypes ===