Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(112 intermediate revisions by 6 users not shown)
Line 33: Line 33:
 
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
 
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
== May 2, 2019 ==
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
  
=== SObA for all ontologies ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
* http://wobr2.caltech.edu/~raymond/cgi-bin/soba_biggo.cgi
 
* SObA graph search for all ontologies
 
* Juancarlos and Raymond looking for feedback; let them know
 
* Would be good to make it clear to users what the edges/arrows mean; what relations are included for inference? Can we include blurb in the legend?
 
** Raymond: thinking of putting blurb on a separate documentation page
 
* Can we track who is using this tool (and others)? SObA is hosted by WormBase so we'd have to ask Todd (Google analytics); SimpleMine is Caltech hosted so we'd need to look at Tazendra logs to see who (and how many people) are using the tool (cannot rule out bots etc.)
 
  
=== IWM swag - screen cloth ===
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
* Need to finalize design
 
* Put Dick & Jane cartoon in one corner, tools in another corner, WB logo, Alliance logo
 
* Daniel can take care of ordering and printing, but needs someone to choose the design
 
  
=== Phenotype ontology patternization ===
 
* Chris working on patternizing phenotype terms to standardize logical definitions for classes of terms
 
* This is part of the ongoing effort to align phenotype ontologies across the organisms
 
* Paul S: It's important for community submitted phenotypes (and the use of the form) to make sure the ontology makes sense and is readily browsable
 
* Does "dumpy" mean the same thing to everyone? Is there a way to assess discrepancies in term definitions among members of the community?
 
* Does someone annotate to Egl or vulvaless?
 
* How do we help community curators decide the correct term?
 
* Could we develop a smart annotation aid/tool/helper? Use images, examples, etc.
 
  
 +
== September 12, 2019 ==
  
== May 9, 2019 ==
+
=== Update on SVM pipeline ===
=== SObA for all ontologies ===
+
* New SVM pipeline: more analysis and more parameter tuning
* Revised version http://wobr2.caltech.edu/~raymond/cgi-bin/soba_biggo.cgi
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
* Changed legend labelling, added linkouts
+
* For example shown, "dumb" machine starts out with precision above 0.6
* To implement on Gene pages. One vs. Five widgets?
+
* G-value (Michael's invention); does not depend on distribution of sets
* Would be good to provide documentation on what the inferences mean; can we provide an indication of what relations a particular SObA edge could be comprised of?
+
* Applied to various data types
** Or click on an edge to get the, e.g., GO subgraph that connects the two nodes?
+
* Analysis: 10-fold cross validation
* Let's release to the website, but gather feedback from (naive) users to see what documentation we need
+
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== Citace upload next Tuesday ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* Tuesday May 14, 2019, 10am Pacific
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
 +
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
=== SPELL is now indexed by Google ===
+
=== Citace upload ===
* Cool!
+
** Tuesday, Sep 24th
  
 +
=== Strain to ID mapping ===
 +
* Waiting on Hinxton to send strain ID mapping file?
 +
* Hopefully we can all get that well before the upload deadline
 +
* Will do global replacement at time of citace upload (at least for now)
  
== May 16, 2019 ==
+
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
=== SObA graphs ===
+
=== SObA Graphs ===
* Edges in graph can be clicked to generate pop ups with link to the (lower) term page showing the graph of inference to upper level nodes
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
* May provide more information than the user needs/wants
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
* Juancarlos working on implementing SObA widgets on gene pages (maybe for WS271 or WS272?)
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
* IWM will have WS271 on staging during meeting; probably on production shortly after the meeting
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
** Hope to show users demos at IWM
 
  
=== IWM swag: screen cloths ===
 
* Paul is working on the graphic
 
* Need to order soon
 
  
=== Testing on staging site ===
+
== September 19, 2019 ==
* Adam and Sibyl request people to test Sequences widgets on Pseudogene, CDS, and Transcript pages on staging
 
* Sibyl has asked for testing of new search on staging
 
  
 +
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
== May 23, 2019 ==
+
=== Alliance literature curation ===
 +
* Working group will be formed soon
 +
* Will work out general common pipelines for literature curation
  
=== WormBase Google folder ===
+
=== SObA Graph relations ===
* https://drive.google.com/drive/folders/1VYJbGYTa-7PncQcZW7VAy73DZ2jL0p4u?usp=sharing
+
* Currently only integrating over "is a", "part of" and "regulates"
 +
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
  
=== ?Genotype class ===
+
=== Author First Pass ===
* Chris and Ranjana will work on early next week
+
* Putting together paper for AFP
* Will need to accommodate disease model annotations
+
* Reviewing all user input for paper
* Will bring up again for discussion with larger group next week
+
* Asking individual curators to check input
* Conclusion: ?Genotype objects will represent the entire genotype of any organism it inheres in, but we will specifically point out the causative agent (or agents) responsible for the disease or phenotype in the individual annotation
 
 
 
 
 
== May 30, 2019 ==
 
 
 
=== Phenotype Request Emails ===
 
* New round for May 2019
 
* 1139 emails total; 825 emails gone out so far today
 
* Chris will report back next week with numbers
 
 
 
=== ?Genotype class ===
 
* Google doc: https://docs.google.com/document/d/19hP9r6BpPW3FSAeC_67FNyNq58NGp4eaXBT42Ch3gDE/edit?usp=sharing
 
* Are or aren't we moving to serial identifiers for strains, e.g. WBStrain0000001? Need to ask Kevin et al.
 
* We would want to indicate in a ?Genotype object which are the "relevant" genes, proposed to be captured in a "Involves_gene" or some such labeled tag
 
** For transgenes, which genes would be considered "relevant"? Only expressed genes? Or also genes for which the promoter and/or 3'UTR are used? We could start with expressed genes only, for now
 
** We would like to infer which genes are involved/relevant based on the components, like variations, rearrangements, transgenes, but we may also want to indicate which genes the authors assert to be part-of/relevant-to the genotype such that if new mappings point to a different gene, users can still trace back to which genotype the authors referred to in the paper originally
 
** Maybe we would want subtags of "Involves_gene" like "Author_asserted" and "WormBase_inferred"? We can try it; what would the XREF tag names be? Distinct?
 
** Do we really want to list all genes that are affected by a rearrangement?
 
** What about RNAi treatments? Should these be considered part of the genotype? Authors often report it that way
 
*** GENO ontology has made terms "intrinsic genotype" and "extrinsic genotype" to distinguish, for example, the nuclear genome genotype from "imposed" genotype effects like RNAi, morpholinos, etc.
 
* We may want to have a tag "Has_background_strain" to refer to the original wild type strain/isolate from which the genotype was derived; this would largely be N2 and an XREF would (over?)populate the N2 strain object with 1000s of "Is_background_for" associations.
 
** It may be best to include this tag, but make it a convention to only annotate it when it ISN'T N2
 
 
 
 
 
== June 6, 2019 ==
 
 
 
=== Phenotype requests ===
 
* Sent out 1140 emails on May 30
 
* Since have received 374 annotations from 54 papers (42 requested, 12 additional)
 
* 21 papers flagged as not having phenotypes
 
* Of 1140 papers emailed about, 35 emails bounced, and have received some flagging or curation on 63 (63/1105 = ~6% response rate), in first week
 

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input