Difference between revisions of "WormBase-Caltech Weekly Calls"
From WormBaseWiki
Jump to navigationJump to searchm (→June 13, 2019) |
|||
(102 intermediate revisions by 6 users not shown) | |||
Line 35: | Line 35: | ||
[[WormBase-Caltech_Weekly_Calls_May_2019|May]] | [[WormBase-Caltech_Weekly_Calls_May_2019|May]] | ||
+ | [[WormBase-Caltech_Weekly_Calls_June_2019|June]] | ||
− | + | [[WormBase-Caltech_Weekly_Calls_July_2019|July]] | |
− | + | [[WormBase-Caltech_Weekly_Calls_August_2019|August]] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | == September 12, 2019 == |
− | |||
− | |||
− | |||
− | |||
+ | === Update on SVM pipeline === | ||
+ | * New SVM pipeline: more analysis and more parameter tuning | ||
+ | * avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set) | ||
+ | * For example shown, "dumb" machine starts out with precision above 0.6 | ||
+ | * G-value (Michael's invention); does not depend on distribution of sets | ||
+ | * Applied to various data types | ||
+ | * Analysis: 10-fold cross validation | ||
+ | ** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled | ||
+ | * F-value changes over different p/n values; G-value does not (essentially flat) | ||
+ | * Area Under the Curve (AUC): probability that a random positive scores higher than random negative | ||
+ | * AUC values for many WB data types upper 80%'s into 90%'s | ||
+ | * Ranjana: How many papers for a good training set? Michael: we don't know yet | ||
+ | * Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM | ||
+ | * If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow) | ||
+ | * Michael can provide training sets he has used recently | ||
− | == | + | === Clarifying definitions of "defective" and "deficient" for phenotypes === |
+ | * WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient" | ||
+ | * Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process | ||
+ | * What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"? | ||
+ | * Definitions include meanings or words: | ||
+ | ** "Variations in the ability" | ||
+ | ** "aberrant" | ||
+ | ** "defect" | ||
+ | ** "defective" | ||
+ | ** "defects" | ||
+ | ** "deficiency" | ||
+ | ** "deficient" | ||
+ | ** "disrupted" | ||
+ | ** "impaired" | ||
+ | ** "incompetent" | ||
+ | ** "ineffective" | ||
+ | ** "perturbation that disrupts" | ||
+ | ** Failure to execute the characteristic response = abnormal? | ||
+ | ** abnormal | ||
+ | ** abnormality leading to specific outcomes | ||
+ | ** fail to exhibit the same taxis behavior = abnormal? | ||
+ | ** failure | ||
+ | ** failure OR delayed | ||
+ | ** failure, slower OR late | ||
+ | ** failure/abnormal | ||
+ | ** reduced | ||
+ | ** slower | ||
− | === | + | === Citace upload === |
− | * | + | ** Tuesday, Sep 24th |
− | * | + | |
− | * | + | === Strain to ID mapping === |
− | * | + | * Waiting on Hinxton to send strain ID mapping file? |
− | ** | + | * Hopefully we can all get that well before the upload deadline |
− | * | + | * Will do global replacement at time of citace upload (at least for now) |
− | ** | + | |
− | * | + | === New name server === |
− | * | + | * When will this officially go live? |
− | + | * Will we now be able to request strain IDs through the server? Yes | |
− | ** | + | |
− | + | === SObA Graphs === | |
− | * | + | * New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes) |
− | * | + | * A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now |
− | + | * Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example) | |
− | + | ** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology | |
− | ** | + | |
+ | |||
+ | == September 19, 2019 == | ||
+ | |||
+ | === Strains === | ||
+ | * Need to wait for new strain IDs from Hinxton before running dumping scripts | ||
+ | * Don't edit multi-ontology strain fields in OA for now! | ||
+ | * Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file | ||
+ | * "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now | ||
+ | |||
+ | === Alliance literature curation === | ||
+ | * Working group will be formed soon | ||
+ | * Will work out general common pipelines for literature curation | ||
+ | |||
+ | === SObA Graph relations === | ||
+ | * Currently only integrating over "is a", "part of" and "regulates" | ||
+ | * Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates" | ||
+ | |||
+ | === Author First Pass === | ||
+ | * Putting together paper for AFP | ||
+ | * Reviewing all user input for paper | ||
+ | * Asking individual curators to check input |
Revision as of 16:39, 19 September 2019
Previous Years
GoToMeeting link: https://www.gotomeet.me/wormbase1
2019 Meetings
September 12, 2019
Update on SVM pipeline
- New SVM pipeline: more analysis and more parameter tuning
- avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
- For example shown, "dumb" machine starts out with precision above 0.6
- G-value (Michael's invention); does not depend on distribution of sets
- Applied to various data types
- Analysis: 10-fold cross validation
- Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
- F-value changes over different p/n values; G-value does not (essentially flat)
- Area Under the Curve (AUC): probability that a random positive scores higher than random negative
- AUC values for many WB data types upper 80%'s into 90%'s
- Ranjana: How many papers for a good training set? Michael: we don't know yet
- Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
- If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
- Michael can provide training sets he has used recently
Clarifying definitions of "defective" and "deficient" for phenotypes
- WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
- Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
- What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
- Definitions include meanings or words:
- "Variations in the ability"
- "aberrant"
- "defect"
- "defective"
- "defects"
- "deficiency"
- "deficient"
- "disrupted"
- "impaired"
- "incompetent"
- "ineffective"
- "perturbation that disrupts"
- Failure to execute the characteristic response = abnormal?
- abnormal
- abnormality leading to specific outcomes
- fail to exhibit the same taxis behavior = abnormal?
- failure
- failure OR delayed
- failure, slower OR late
- failure/abnormal
- reduced
- slower
Citace upload
- Tuesday, Sep 24th
Strain to ID mapping
- Waiting on Hinxton to send strain ID mapping file?
- Hopefully we can all get that well before the upload deadline
- Will do global replacement at time of citace upload (at least for now)
New name server
- When will this officially go live?
- Will we now be able to request strain IDs through the server? Yes
SObA Graphs
- New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
- A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
- Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
- Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
September 19, 2019
Strains
- Need to wait for new strain IDs from Hinxton before running dumping scripts
- Don't edit multi-ontology strain fields in OA for now!
- Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
- "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
Alliance literature curation
- Working group will be formed soon
- Will work out general common pipelines for literature curation
SObA Graph relations
- Currently only integrating over "is a", "part of" and "regulates"
- Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
Author First Pass
- Putting together paper for AFP
- Reviewing all user input for paper
- Asking individual curators to check input