Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
Line 41: Line 41:
== September 12, 2019 ==
=== Update on SVM pipeline ===
* New SVM pipeline: more analysis and more parameter tuning
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
* For example shown, "dumb" machine starts out with precision above 0.6
* G-value (Michael's invention); does not depend on distribution of sets
* Applied to various data types
* Analysis: 10-fold cross validation
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
* F-value changes over different p/n values; G-value does not (essentially flat)
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
* AUC values for many WB data types upper 80%'s into 90%'s
* Ranjana: How many papers for a good training set? Michael: we don't know yet
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
* Michael can provide training sets he has used recently
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
* Definitions include meanings or words:
** "Variations in the ability"
** "aberrant"
** "defect"
** "defective"
** "defects"
** "deficiency"
** "deficient"
** "disrupted"
** "impaired"
** "incompetent"
** "ineffective"
** "perturbation that disrupts"
** Failure to execute the characteristic response = abnormal?
** abnormal
** abnormality leading to specific outcomes
** fail to exhibit the same taxis behavior = abnormal?
** failure
** failure OR delayed
** failure, slower OR late
** failure/abnormal
** reduced
** slower
=== Citace upload ===
** Tuesday, Sep 24th
=== Strain to ID mapping ===
* Waiting on Hinxton to send strain ID mapping file?
* Hopefully we can all get that well before the upload deadline
* Will do global replacement at time of citace upload (at least for now)
=== New name server ===
* When will this officially go live?
* Will we now be able to request strain IDs through the server? Yes
=== SObA Graphs ===
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
== September 19, 2019 ==
=== Strains ===
* Need to wait for new strain IDs from Hinxton before running dumping scripts
* Don't edit multi-ontology strain fields in OA for now!
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
=== Alliance literature curation ===
* Working group will be formed soon
* Will work out general common pipelines for literature curation
=== SObA Graph relations ===
* Currently only integrating over "is a", "part of" and "regulates"
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
=== Author First Pass ===
* Putting together paper for AFP
* Reviewing all user input for paper
* Asking individual curators to check input
== September 26, 2019 ==
=== Data mining ===
* Someone in Paul's lab asking to retrieve list of C. elegans orthologs from a list of human genes
* Could we build a (simple) Alliance tool to do this?
* Could SimpleMine do this? Could we build a SimpleMine-like tool for Alliance?
=== Strains ===
* Paul D generated WBStrains for the missing TransgeneOme objects
* Working on a pipeline to identify new TransgeneOme strains at each upload
* One TransgeneOme object had 2 strains. Possible solutions: dump 2 expression objects that differ only in the Strain or remove the UNIQUE tag in the data model
** Probably best to keep UNIQUE tag
* Raymond: concerned about automatically generating strains based on imports from the group
* Many odd strain names are coming from the TransgeneOme group; maybe we ought to have more discussions about generating official (following nomenclature standards) strain names from their imports
* Quarantine strains on initial import; review and accept if pass standards
=== Community phenotype requests August 2019 ===
* Sent out new round of phenotype requests on August 20, 21, and 22, 2019
* 2,626 emails/papers requested
* 114 emails bounced; 5 resent to new addresses
* 460 Phenotype OA community annotations; 181 RNAi OA annotations (641 annotations total)
* From 94 papers (83 for Phenotype OA; 33 for RNAi; 22 for both)
* By 81 distinct community curators (70 for Phenotype OA; 32 for RNAi OA; 21 for both)
* 50 papers flagged as not having phenotypes (40 papers DO have phenotypes; 10 marked as negative; 80% failure rate!)
** Email states: "If there are no nematode phenotypes in this paper click the following link :"
** Maybe people are confused, or want to blow off the request
** Maybe we can programmatically generate short URLs for the link? May be difficult
** Provide a link to correct mistakes on confirmation page
* 4 papers flagged for phenotypes (only 2 had curatable phenotypes; 1 had honey-induced phenotypes)
* 115 papers with responses (5% response); 24 papers with input that were not main focus of request
* Can we provide an opt-out link?
=== Comparison SObA ===
* Actually quite complicated; may require more consideration
=== SObA graph and Ontology Browser for papers ===
* May be able to modify/hack existing tools for genes and apply to papers
* Paper-term matching powered by Textpresso

Revision as of 14:27, 3 October 2019