Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(26 intermediate revisions by 4 users not shown)
Line 39: Line 39:
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
 +
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
== August 1, 2019 ==
 
  
=== Life stage public names missing in WS271 ===
+
== September 12, 2019 ==
* Did we ever get a patch in for this?
 
* WormMine only has WBls IDs (no public name) for almost all life stages
 
* Wen will resend the patch file
 
  
=== 2020 WB NAR paper ===
+
=== Update on SVM pipeline ===
* Who's contributing?
+
* New SVM pipeline: more analysis and more parameter tuning
** Raymond, Chris, Ranjana, Valerio, Daniela, Kimberly
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
* Topics
+
* For example shown, "dumb" machine starts out with precision above 0.6
** Automated descriptions
+
* G-value (Michael's invention); does not depend on distribution of sets
** Ontology tools (SObA)
+
* Applied to various data types
** Community curation
+
* Analysis: 10-fold cross validation
** Author first pass
+
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== 2019 IWM Workshop videos ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* On YouTube, but not public yet
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* Chris will make them public and send links to Ranjana for the blog post
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
 +
=== Citace upload ===
 +
** Tuesday, Sep 24th
  
== August 15, 2019 ==
+
=== Strain to ID mapping ===
 +
* Waiting on Hinxton to send strain ID mapping file?
 +
* Hopefully we can all get that well before the upload deadline
 +
* Will do global replacement at time of citace upload (at least for now)
  
=== GO Alliance slim terms ===
+
=== New name server ===
* We need to update our GO slim terms for WB GO ribbons to be in sync with Alliance
+
* When will this officially go live?
* May need to watch out for terms that don't apply to worms
+
* Will we now be able to request strain IDs through the server? Yes
* Raymond gets slim terms into Solr from OBO release file; Sibyl collecting from different source; should make the same (pull from WB FTP site?)
 
  
=== Phenotype ontology patternization ===
+
=== SObA Graphs ===
* Now have 676 terms patternized (27% of 2,506 terms total)
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
* Have reviewed the class hierarchy, collecting list of unexpected class subsumptions
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
* Issues to address collected here: https://docs.google.com/document/d/1IWtQbEQ-elM-U5SQyU4VfIH3vdJp6taVMGViIjGyVks/edit?usp=sharing
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
 
 
== August 22, 2019 ==
 
 
 
=== Obsolete ontology terms in Postgres ===
 
* There are currently 172 GO annotations in the GO OA, 94 in Expr OA, and 54 in Pic OA referring to obsolete GO terms
 
** https://docs.google.com/spreadsheets/d/14iG3-s0GrZ3_W87iOjD6tZiQiUJklRRjgs6ARFWi9E4/edit?usp=sharing
 
* We would like a mechanism for detecting and alerting curators to obsolete ontology terms in the OA/Postgres
 
 
 
=== Community phenotype requests ===
 
* Sent out new round of phenotype requests on August 20, 21, and 22 (today) 2019
 
* 2,627 emails/papers requested
 
* 112 emails bounced; 4 resent to new addresses
 
* 205 Phenotype OA community annotations; 55 RNAi OA annotations, from 47 papers, by 42 distinct community curators (so far)
 
* Also, 3 worksheets submitted, for 4 papers
 
* 35 papers flagged as not having phenotypes
 

Revision as of 21:07, 12 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology