Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(219 intermediate revisions by 8 users not shown)
Line 21: Line 21:
  
 
GoToMeeting link: https://www.gotomeet.me/wormbase1
 
GoToMeeting link: https://www.gotomeet.me/wormbase1
 
  
  
Line 28: Line 27:
 
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
 
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
  
 +
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
  
== February 7, 2019 ==
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
 
 
=== Automation of curation ===
 
* For paper-by-paper curators (vs. datatype-by-datatype) it's difficult to parse papers into data types
 
* MGI and WB curate by datatype, so may have an easier time
 
* All MODS still need paper triage
 
 
 
=== IWM workshop ===
 
* Ranjana replied to Julie about a consolidated WormBase workshop; hasn't heard back
 
 
 
=== Site-specific outreach ===
 
* Wen has been invited to small RNA meeting in Mexico in April; may go if they offer to pay (waiting to hear back)
 
  
=== SObA ===
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
* Raymond and Juancarlos have worked out a way to display top-slicing of the ontology (trimming to higher level nodes)
 
* Needs a bit more clean up; will send around a URL when ready
 
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
== February 21, 2019 ==
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
  
=== SObA ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
* SObA GO graph, with top slicing function. You can try it here <http://wobr2.caltech.edu/~raymond/cgi-bin/soba_biggo.cgi> by entering the name of your favorite gene.
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
* specific genes, from simple to complex
 
let-4 <http://wobr2.caltech.edu/~raymond/cgi-bin/soba_biggo.cgi?action=annotSummaryCytoscape&showControlsFlag=0&autocompleteValue=let-4%20(Caenorhabditis%20elegans,%20WB:WBGene00002282,%20-,%20C44H4.2,%20sym-5)>
 
  
let-60 <http://wobr2.caltech.edu/~raymond/cgi-bin/soba_biggo.cgi?action=annotSummaryCytoscape&showControlsFlag=0&autocompleteValue=let-60%20(Caenorhabditis%20elegans,%20WB:WBGene00002335,%20-,%20ZK792.6,%20lin-34)>
+
== September 12, 2019 ==
  
daf-2 <http://wobr2.caltech.edu/~raymond/cgi-bin/soba_biggo.cgi?action=annotSummaryCytoscape&showControlsFlag=0&autocompleteValue=daf-2%20(Caenorhabditis%20elegans,%20WB:WBGene00000898,%20-,%20Y55D5A.5)>
+
=== Update on SVM pipeline ===
 +
* New SVM pipeline: more analysis and more parameter tuning
 +
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
 +
* For example shown, "dumb" machine starts out with precision above 0.6
 +
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
* Looks good
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* Some confusion with filled circle legend (half red and half blue) for slim terms; maybe make two separate circles, one red, one blue and possibly indicate each as direct/indirect, respectively
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* May be good to make graph depth options more obvious; we can move it up to the top of the legend, maybe also display all graph depths by default so a user can see them and select one
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
* Would it be possible to force display of any AGR/Alliance slim terms on the path from an annotation node to root (not all pertinent slim terms are currently displaying due to the trimming algorithms in place); might be possible but would probably significantly interfere with the current trimming and display algorithm
+
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
=== Moving away from dependency on OBO format files ===
+
=== Citace upload ===
* Nico, an ontology developer with the Monarch group, has some suggestions for how we can move away from dependency on OBO files
+
** Tuesday, Sep 24th
* Planning to have a meeting on Monday (Feb 25) at 9am Pacific, 12pm Eastern, 5pm UK to discuss
 
* How are we handling OBO for GO? GO OBO is parsed into ACE format
 
* This should be a group-wide discussion as we have many OBO dependencies
 
* There are several OBO-to-ACE conversion scripts we use
 
* What is the trend in the ontology field? Seems to be that OBO will become deprecated and OWL will supersede it
 
* We will try to reschedule to Tuesday (same time) to include more people
 
  
=== Phenotype request mass emails ===
+
=== Strain to ID mapping ===
* 1,820 emails gone out this week (since Monday) for 1,820 papers (957 still in queue; need to wait for Google limit to expire)
+
* Waiting on Hinxton to send strain ID mapping file?
* Have received 217 direct annotations for 49 papers (45 requested papers) from 47 distinct community curators
+
* Hopefully we can all get that well before the upload deadline
** 158 to Phenotype OA
+
* Will do global replacement at time of citace upload (at least for now)
** 59 to RNAi OA
 
* Have received 5 submissions via email (4 worksheets)
 
** 1 of these is just strain/transgene submissions (no phenotypes; 79 strains)
 
** 4 phenotype submissions, 4 curators, 93 phenotype annotations
 
* 57 papers flagged as not having phenotypes
 
* Total: 310 phenotype annotations from 51 community curators for 53 papers
 
  
 +
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
== February 28, 2019 ==
+
=== SObA Graphs ===
 +
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
 +
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 +
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 +
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
=== Phenotype request mass emails ===
 
* 2,777 emails gone out (last week) for 2,777 papers
 
* Have received 421 direct annotations for 88 papers (81 requested papers; ~3% response rate) from 84 distinct community curators
 
** 313 to Phenotype OA
 
** 108 to RNAi OA
 
* Have received 5 submissions via email (4 worksheets)
 
** 1 of these is just strain/transgene submissions (no phenotypes; 79 strains)
 
** 4 phenotype submissions, 4 curators, 93 phenotype annotations
 
* 83 papers flagged as not having phenotypes
 
* Total: 514 phenotype annotations from 88 community curators for 92 papers
 
* ~6% response rate including negtaive flagging
 
* Are annotations accurate? Yes, as far as has been checked
 
* What type of depth are we getting? Good depth; although maybe not lots of details or treatment conditions, we often get very specific proposed phenotypes
 
* We could visit labs and email them right after the visit
 
* What about an undergrad course? Graduate students? Journal club activity?
 
* Caltech has a small body of bio majors, may not get much yield, but could focus on specific pool of grad students
 
* Community vs WB curator comparison:
 
** Chris thoroughly curated one paper also curated by community
 
** Community curator submitted four annotations; Chris curated 29
 
** On a per annotation basis, processing community annotations is (for this paper) about 3-fold more efficient than manual curation; we probably have higher gains for papers with more annotations
 
** Checking involves making sure, for example, that alleles, genes, and phenotypes submitted are referred to in paper; may be able to take advantage of AFP pipeline to automate some of this
 
* Bounced emails
 
** Out of 2,777 emails total, 150 bounced
 
** Of 150, 85 failed to find the address, 32 just failed, 15 said failed but to try resending; "Policy reason"
 
** Have sent Cecilia list of bounced addresses; will update records accordingly, although sometimes hard to tell if we should make any updates since it may just be that our email was detected as SPAM and not that the email address is bad
 
  
 +
== September 19, 2019 ==
  
 +
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
 +
=== Alliance literature curation ===
 +
* Working group will be formed soon
 +
* Will work out general common pipelines for literature curation
  
=== OBO axioms ===
+
=== SObA Graph relations ===
* Nico is asking for a list of all OBO axioms needed by WormBase
+
* Currently only integrating over "is a", "part of" and "regulates"
* He will then make sure that any OWL-derived OBO files (from ODK pipeline) will include required axioms
+
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
* Collecting list of OBO relations in [https://docs.google.com/document/d/1V7KnAIcFsf2Kydm3NyHQT0wUBU3Z8zmfk8ByssTJeLA/edit?usp=sharing minutes doc from meeting on Tuesday]
 
  
=== SObA update ===
+
=== Author First Pass ===
* Probably soon going to freeze development of SObA
+
* Putting together paper for AFP
* Expand SObA graphs for all applicable ontologies
+
* Reviewing all user input for paper
* SObA for other species, we need to assess interest
+
* Asking individual curators to check input

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input