Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(14 intermediate revisions by 2 users not shown)
Line 41: Line 41:
 
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
 
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
 +
[[WormBase-Caltech_Weekly_Calls_September_2019|September]]
  
== September 12, 2019 ==
 
  
=== Update on SVM pipeline ===
+
== October 3, 2019 ==
* New SVM pipeline: more analysis and more parameter tuning
 
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
 
* For example shown, "dumb" machine starts out with precision above 0.6
 
* G-value (Michael's invention); does not depend on distribution of sets
 
* Applied to various data types
 
* Analysis: 10-fold cross validation
 
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 
* F-value changes over different p/n values; G-value does not (essentially flat)
 
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 
* AUC values for many WB data types upper 80%'s into 90%'s
 
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 
* Michael can provide training sets he has used recently
 
  
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
+
=== SObA comparison graphs ===
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
+
* Raymond and Juancarlos have worked on a SObA-graph based comparison tool to compare two genes for ontology-based annotations
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
+
* [http://wobr2.caltech.edu/~azurebrd/cgi-bin/soba_multi.cgi?action=Gene+Pair+to+SObA+Graph Prototype 1]
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
+
** [http://wobr2.caltech.edu/~azurebrd/cgi-bin/soba_multi.cgi?action=annotSummaryCytoscape&filterForLcaFlag=1&filterLongestFlag=1&showControlsFlag=0&datatype=phenotype&geneOneValue=lin-3%20(Caenorhabditis%20elegans,%20WB:WBGene00002992,%20-,%20F36H1.4)&autocompleteValue=let-23%20(Caenorhabditis%20elegans,%20WB:WBGene00002299,%20-,%20ZK1067.1 Example comparison between lin-3 and let-23]
* Definitions include meanings or words:
+
* [http://wobr1.caltech.edu/~azurebrd/cgi-bin/soba_multi.cgi?action=Gene+Pair+to+SObA+Graph Prototype 2]
** "Variations in the ability"
+
** [http://wobr1.caltech.edu/~azurebrd/cgi-bin/soba_multi.cgi?action=annotSummaryCytoscape&filterForLcaFlag=1&filterLongestFlag=1&showControlsFlag=0&datatype=phenotype&geneOneValue=lin-3%20(Caenorhabditis%20elegans,%20WB:WBGene00002992,%20-,%20F36H1.4)&autocompleteValue=let-23%20(Caenorhabditis%20elegans,%20WB:WBGene00002299,%20-,%20ZK1067.1) Example comparison between lin-3 and let-23]
** "aberrant"
+
* What information does a user most care about?
** "defect"
+
# What terms (nodes) are annotated to gene 1 and what terms to gene 2
** "defective"
+
# For a given term, what is the relative number of annotations between gene 1 and gene 2.
** "defects"
+
# For a given node, what is the relative number of annotations each gene has to the total annotations of that gene.
** "deficiency"
+
* # 3 is actually what we applied to size the nodes in the single-gene version of SObA. Thus, not surprisingly, I think it is important.
** "deficient"
+
* Generally people like Prototype 2 as a default view; we could possibly have a toggle to see the other view
** "disrupted"
+
* In either case users need a good legend and/or documentation
** "impaired"
+
* Jae, it would be good if a user could specifically highlight nodes specific to each gene and gray-out or de-emphasize the common nodes
** "incompetent"
 
** "ineffective"
 
** "perturbation that disrupts"
 
** Failure to execute the characteristic response = abnormal?
 
** abnormal
 
** abnormality leading to specific outcomes
 
** fail to exhibit the same taxis behavior = abnormal?
 
** failure
 
** failure OR delayed
 
** failure, slower OR late
 
** failure/abnormal
 
** reduced
 
** slower
 
  
=== Citace upload ===
 
** Tuesday, Sep 24th
 
  
=== Strain to ID mapping ===
+
=== Germ line discussion ===
* Waiting on Hinxton to send strain ID mapping file?
+
* Currently, the anatomy ontology has "germ line" as a type of "Cell" and a type of "Tissue", and "germ cell" as a type of "germ line"
* Hopefully we can all get that well before the upload deadline
+
* Chris would like to (1) remove "germ line" from under "Cell" and leave it under "Tissue" and (2) move "germ cell" out from under "germ line" and place directly under "Cell"
* Will do global replacement at time of citace upload (at least for now)
+
** [https://github.com/obophenotype/c-elegans-gross-anatomy-ontology/pull/23 Made pull request]
 +
* Chris will update pull request to include a change to move "germline precursor cell" out from under "germ line" and place it under "Cell" (done)
  
=== New name server ===
+
=== Script to remove blank entries from Postgres ===
* When will this officially go live?
+
* Chris stumbled across several entries in the OA that were blank (empty strings) or consisted of only whitespace, some of which were causing errors upon upload to ACEDB
* Will we now be able to request strain IDs through the server? Yes
+
* Juancarlos has written a script to look for all such entries; 66 tables have them on sandbox (likely same on live OA)
 
+
* Does anyone object to removing these entries throughout Postgres?
=== SObA Graphs ===
+
* Juancarlos will remove all the empty fields identified by his script
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
 
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
 
 
 
 
 
== September 19, 2019 ==
 
 
 
=== Strains ===
 
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 
* Don't edit multi-ontology strain fields in OA for now!
 
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
 
 
 
=== Alliance literature curation ===
 
* Working group will be formed soon
 
* Will work out general common pipelines for literature curation
 
 
 
=== SObA Graph relations ===
 
* Currently only integrating over "is a", "part of" and "regulates"
 
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
 
 
 
=== Author First Pass ===
 
* Putting together paper for AFP
 
* Reviewing all user input for paper
 
* Asking individual curators to check input
 
 
 
 
 
== September 26, 2019 ==
 
 
 
=== Data mining ===
 
* Someone in Paul's lab asking to retrieve list of C. elegans orthologs from a list of human genes
 
* Could we build a (simple) Alliance tool to do this?
 
* Could SimpleMine do this? Could we build a SimpleMine-like tool for Alliance?
 
 
 
=== Strains ===
 
* Paul D generated WBStrains for the missing TransgeneOme objects
 
* Working on a pipeline to identify new TransgeneOme strains at each upload
 
* One TransgeneOme object had 2 strains. Possible solutions: dump 2 expression objects that differ only in the Strain or remove the UNIQUE tag in the data model
 
** Probably best to keep UNIQUE tag
 
* Raymond: concerned about automatically generating strains based on imports from the group
 
* Many odd strain names are coming from the TransgeneOme group; maybe we ought to have more discussions about generating official (following nomenclature standards) strain names from their imports
 
* Quarantine strains on initial import; review and accept if pass standards
 
 
 
=== Community phenotype requests August 2019 ===
 
* Sent out new round of phenotype requests on August 20, 21, and 22, 2019
 
* 2,626 emails/papers requested
 
* 114 emails bounced; 5 resent to new addresses
 
* 460 Phenotype OA community annotations; 181 RNAi OA annotations (641 annotations total)
 
* From 94 papers (83 for Phenotype OA; 33 for RNAi; 22 for both)
 
* By 81 distinct community curators (70 for Phenotype OA; 32 for RNAi OA; 21 for both)
 
* 50 papers flagged as not having phenotypes (40 papers DO have phenotypes; 10 marked as negative; 80% failure rate!)
 
** Email states: "If there are no nematode phenotypes in this paper click the following link :"
 
** Maybe people are confused, or want to blow off the request
 
** Maybe we can programmatically generate short URLs for the link? May be difficult
 
** Provide a link to correct mistakes on confirmation page
 
* 4 papers flagged for phenotypes (only 2 had curatable phenotypes; 1 had honey-induced phenotypes)
 
* 115 papers with responses (5% response); 24 papers with input that were not main focus of request
 
* Can we provide an opt-out link?
 
 
 
=== Comparison SObA ===
 
* Actually quite complicated; may require more consideration
 
 
 
=== SObA graph and Ontology Browser for papers ===
 
* May be able to modify/hack existing tools for genes and apply to papers
 
* Paper-term matching powered by Textpresso
 

Revision as of 18:26, 3 October 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August

September


October 3, 2019

SObA comparison graphs

  1. What terms (nodes) are annotated to gene 1 and what terms to gene 2
  2. For a given term, what is the relative number of annotations between gene 1 and gene 2.
  3. For a given node, what is the relative number of annotations each gene has to the total annotations of that gene.
  • # 3 is actually what we applied to size the nodes in the single-gene version of SObA. Thus, not surprisingly, I think it is important.
  • Generally people like Prototype 2 as a default view; we could possibly have a toggle to see the other view
  • In either case users need a good legend and/or documentation
  • Jae, it would be good if a user could specifically highlight nodes specific to each gene and gray-out or de-emphasize the common nodes


Germ line discussion

  • Currently, the anatomy ontology has "germ line" as a type of "Cell" and a type of "Tissue", and "germ cell" as a type of "germ line"
  • Chris would like to (1) remove "germ line" from under "Cell" and leave it under "Tissue" and (2) move "germ cell" out from under "germ line" and place directly under "Cell"
  • Chris will update pull request to include a change to move "germline precursor cell" out from under "germ line" and place it under "Cell" (done)

Script to remove blank entries from Postgres

  • Chris stumbled across several entries in the OA that were blank (empty strings) or consisted of only whitespace, some of which were causing errors upon upload to ACEDB
  • Juancarlos has written a script to look for all such entries; 66 tables have them on sandbox (likely same on live OA)
  • Does anyone object to removing these entries throughout Postgres?
  • Juancarlos will remove all the empty fields identified by his script