Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(19 intermediate revisions by 4 users not shown)
Line 39: Line 39:
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
 +
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
== August 1, 2019 ==
 
  
=== Life stage public names missing in WS271 ===
+
== September 12, 2019 ==
* Did we ever get a patch in for this?
 
* WormMine only has WBls IDs (no public name) for almost all life stages
 
* Wen will resend the patch file
 
  
=== 2020 WB NAR paper ===
+
=== Update on SVM pipeline ===
* Who's contributing?
+
* New SVM pipeline: more analysis and more parameter tuning
** Raymond, Chris, Ranjana, Valerio, Daniela, Kimberly
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
* Topics
+
* For example shown, "dumb" machine starts out with precision above 0.6
** Automated descriptions
+
* G-value (Michael's invention); does not depend on distribution of sets
** Ontology tools (SObA)
+
* Applied to various data types
** Community curation
+
* Analysis: 10-fold cross validation
** Author first pass
+
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== 2019 IWM Workshop videos ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* On YouTube, but not public yet
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* Chris will make them public and send links to Ranjana for the blog post
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
 +
=== Citace upload ===
 +
** Tuesday, Sep 24th
  
== August 15, 2019 ==
+
=== Strain to ID mapping ===
 +
* Waiting on Hinxton to send strain ID mapping file?
 +
* Hopefully we can all get that well before the upload deadline
 +
* Will do global replacement at time of citace upload (at least for now)
  
=== GO Alliance slim terms ===
+
=== New name server ===
* We need to update our GO slim terms for WB GO ribbons to be in sync with Alliance
+
* When will this officially go live?
* May need to watch out for terms that don't apply to worms
+
* Will we now be able to request strain IDs through the server? Yes
* Raymond gets slim terms into Solr from OBO release file; Sibyl collecting from different source; should make the same (pull from WB FTP site?)
 
  
=== Phenotype ontology patternization ===
+
=== SObA Graphs ===
* Now have 676 terms patternized (27% of 2,506 terms total)
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
* Have reviewed the class hierarchy, collecting list of unexpected class subsumptions
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
* Issues to address collected here: https://docs.google.com/document/d/1IWtQbEQ-elM-U5SQyU4VfIH3vdJp6taVMGViIjGyVks/edit?usp=sharing
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
 
 
== August 22, 2019 ==
 
 
 
=== Obsolete ontology terms in Postgres ===
 
* There are currently 172 GO annotations in the GO OA, 94 in Expr OA, and 54 in Pic OA referring to obsolete GO terms
 
** https://docs.google.com/spreadsheets/d/14iG3-s0GrZ3_W87iOjD6tZiQiUJklRRjgs6ARFWi9E4/edit?usp=sharing
 
* We would like a mechanism for detecting and alerting curators to obsolete ontology terms in the OA/Postgres
 
 
 
=== Community phenotype requests ===
 
* Sent out new round of phenotype requests on August 20, 21, and 22 (today) 2019
 
* 2,627 emails/papers requested
 
* 112 emails bounced; 4 resent to new addresses
 
* 205 Phenotype OA community annotations; 55 RNAi OA annotations, from 47 papers, by 42 distinct community curators (so far)
 
* Also, 3 worksheets submitted, for 4 papers
 
* 35 papers flagged as not having phenotypes
 
* 86 papers with responses (3% response; still early)
 
* Need to coordinate with the AFP request pipeline
 
 
 
=== GO slim terms ===
 
* SObA highlights Alliance slim terms, but doesn't correspond to ribbon
 
* Want to use same slim terms used for ribbons
 
* Add slim terms into ACEDB? One option
 
* Ribbon order of slim terms is an issue
 
* Decided not to store ribbon info (slim terms and term order) in ACEDB, but rather in web code
 
* Will have to manually synchronize with Alliance if Alliance changes its ribbon
 
 
 
== August 29, 2019 ==
 
 
 
=== Caltech use of Hinxton name service ===
 
* Name service :  
 
** https://names.wormbase.org/gene
 
* API documentation :
 
** https://names.wormbase.org/api-docs/index.html#!
 
* Sibyl is wondering :
 
** "it would be nice to know when and how curators use the name service. And does any script managed by Caltech write data into the name services?"
 
** "what needs to be done to adopt the new name service. For example, what concern do you or the curators have about adopting the new name service, what kind of tests will allow the curators to trust that the new name service is working correctly, what changes to the new name service is needed for Caltech to be able to use it."
 

Revision as of 21:07, 12 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology