Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(14 intermediate revisions by one other user not shown)
Line 39: Line 39:
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
 
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
 +
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
== August 1, 2019 ==
 
  
=== Life stage public names missing in WS271 ===
+
== September 12, 2019 ==
* Did we ever get a patch in for this?
 
* WormMine only has WBls IDs (no public name) for almost all life stages
 
* Wen will resend the patch file
 
  
=== 2020 WB NAR paper ===
+
=== Update on SVM pipeline ===
* Who's contributing?
+
* New SVM pipeline: more analysis and more parameter tuning
** Raymond, Chris, Ranjana, Valerio, Daniela, Kimberly
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
* Topics
+
* For example shown, "dumb" machine starts out with precision above 0.6
** Automated descriptions
+
* G-value (Michael's invention); does not depend on distribution of sets
** Ontology tools (SObA)
+
* Applied to various data types
** Community curation
+
* Analysis: 10-fold cross validation
** Author first pass
+
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 
+
* F-value changes over different p/n values; G-value does not (essentially flat)
=== 2019 IWM Workshop videos ===
+
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
* On YouTube, but not public yet
+
* AUC values for many WB data types upper 80%'s into 90%'s
* Chris will make them public and send links to Ranjana for the blog post
+
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 
+
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 
+
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
== August 15, 2019 ==
+
* Michael can provide training sets he has used recently
 
 
=== GO Alliance slim terms ===
 
* We need to update our GO slim terms for WB GO ribbons to be in sync with Alliance
 
* May need to watch out for terms that don't apply to worms
 
* Raymond gets slim terms into Solr from OBO release file; Sibyl collecting from different source; should make the same (pull from WB FTP site?)
 
 
 
=== Phenotype ontology patternization ===
 
* Now have 676 terms patternized (27% of 2,506 terms total)
 
* Have reviewed the class hierarchy, collecting list of unexpected class subsumptions
 
* Issues to address collected here: https://docs.google.com/document/d/1IWtQbEQ-elM-U5SQyU4VfIH3vdJp6taVMGViIjGyVks/edit?usp=sharing
 
  
 +
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
 +
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
 +
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
== August 22, 2019 ==
+
=== Citace upload ===
 +
** Tuesday, Sep 24th
  
=== Obsolete ontology terms in Postgres ===
+
=== Strain to ID mapping ===
* There are currently 172 GO annotations in the GO OA, 94 in Expr OA, and 54 in Pic OA referring to obsolete GO terms
+
* Waiting on Hinxton to send strain ID mapping file?
** https://docs.google.com/spreadsheets/d/14iG3-s0GrZ3_W87iOjD6tZiQiUJklRRjgs6ARFWi9E4/edit?usp=sharing
+
* Hopefully we can all get that well before the upload deadline
* We would like a mechanism for detecting and alerting curators to obsolete ontology terms in the OA/Postgres
+
* Will do global replacement at time of citace upload (at least for now)
  
=== Community phenotype requests ===
+
=== New name server ===
* Sent out new round of phenotype requests on August 20, 21, and 22 (today) 2019
+
* When will this officially go live?
* 2,627 emails/papers requested
+
* Will we now be able to request strain IDs through the server? Yes
* 112 emails bounced; 4 resent to new addresses
 
* 205 Phenotype OA community annotations; 55 RNAi OA annotations, from 47 papers, by 42 distinct community curators (so far)
 
* Also, 3 worksheets submitted, for 4 papers
 
* 35 papers flagged as not having phenotypes
 
* 86 papers with responses (3% response; still early)
 
* Need to coordinate with the AFP request pipeline
 
  
=== GO slim terms ===
+
=== SObA Graphs ===
* SObA highlights Alliance slim terms, but doesn't correspond to ribbon
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
* Want to use same slim terms used for ribbons
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
* Add slim terms into ACEDB? One option
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
* Ribbon order of slim terms is an issue
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
* Decided not to store ribbon info (slim terms and term order) in ACEDB, but rather in web code
 
* Will have to manually synchronize with Alliance if Alliance changes its ribbon
 
 
 
== August 29, 2019 ==
 
 
 
=== Caltech use of Hinxton name service ===
 
* Name service :
 
** https://names.wormbase.org/gene
 
* API documentation :
 
** https://names.wormbase.org/api-docs/index.html#!
 
* Sibyl is wondering :
 
** "it would be nice to know when and how curators use the name service. And does any script managed by Caltech write data into the name services?"
 
** "what needs to be done to adopt the new name service. For example, what concern do you or the curators have about adopting the new name service, what kind of tests will allow the curators to trust that the new name service is working correctly, what changes to the new name service is needed for Caltech to be able to use it."
 
* Daniela, Chris, Karen are the only Caltech curators who use it.
 
* Daniela's workflow is :
 
** Look up variation through OA
 
** Look up variation through name service (what's the URL for that ?)
 
** Create object through name service, then enter information
 
** Create temporary entry for OA through Caltech postgres services
 
* Juancarlos has emailed Sibyl + Matt + Caltech with what we know so far.  Karen and Chris, please reply if you have other name service uses beyond what Daniela does.
 
* Name service google/wormbase login doesn't work right now for Caltech curators, Matt's been asked to add people.
 
 
 
===Anatomy term names===
 
* While going through the concise descriptions text Daniela have noted down some terms that could benefit from having a more descriptive name.
 
** e.g.: Pn.p hermaphrodite -> Pn.p hermaphrodite vulval precursor cell. e.g. P3.p hermaphrodite (WBbt:0008112) -> P3.p hermaphrodite vulval precursor cell
 
List of other potential terms:
 
<pre>
 
AB -> embryonic founder cell AB
 
EMS -> embryonic founder cell EMS -> should the definition of EMS be changed from ‘Embryonic cell’ to ‘Embryonic founder cell’? See WormAtlas founder cell definition.
 
C -> embryonic founder cell C
 
D ->  embryonic founder cell D
 
E ->  embryonic founder cell E
 
MS -> embryonic founder cell MS
 
Psub1 -> embryonic founder cell Psub1 (or simply embryonic founder cell P1)
 
Psub2 -> embryonic founder cell Psub2
 
Psub3 -> embryonic founder cell Psub3
 
Psub4 -> germline founder cell Psub4
 
G1 -> Postembryonic blast cell G1
 
G2 -> Postembryonic blast cell G2
 
</pre>
 
 
 
* we will keep the names as they are.
 
* for the term 'Pn.p hermaphrodite' Raymond will change the name into  'Pn.p in hermaphrodite'
 
 
 
== September 12, 2019 ==
 
 
 
=== Update on SVM pipeline ===
 

Revision as of 21:07, 12 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology