Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(267 intermediate revisions by 8 users not shown)
Line 23: Line 23:
  
  
 +
= 2019 Meetings =
  
= 2019 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
== January 3, 2019 ==
 
  
=== WS270 Citace upload ===
+
== September 12, 2019 ==
* Next Tuesday, Jan 8th, 10am Pacific
 
  
=== Gene descriptions ===
+
=== Update on SVM pipeline ===
* Valerio generated new files to ignore/filter-out problematic genes
+
* New SVM pipeline: more analysis and more parameter tuning
* Still need to validate new pipeline
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
* Barring any major issues, will submit new files for WS270 (can load old files if needed)
+
* For example shown, "dumb" machine starts out with precision above 0.6
* Maybe should define a test set (random sample) to test each release? Already have a test set
+
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== Protege Tutorial ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* Doodle poll open: https://doodle.com/poll/kn49rd3rggymn68g
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* Please fill out poll if you are interested in attending; have responses from Kimberly and Gary S.
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
 +
=== Citace upload ===
 +
** Tuesday, Sep 24th
  
==January 11th, 2019==
+
=== Strain to ID mapping ===
 +
* Waiting on Hinxton to send strain ID mapping file?
 +
* Hopefully we can all get that well before the upload deadline
 +
* Will do global replacement at time of citace upload (at least for now)
  
===WB workshop at IWM 2019===
+
=== New name server ===
Here's a draft, need to finalize as Jan 15th is the deadline
+
* When will this officially go live?
<pre style="white-space: pre-wrap;
+
* Will we now be able to request strain IDs through the server? Yes
white-space: -moz-pre-wrap;
 
white-space: -pre-wrap;
 
white-space: -o-pre-wrap;
 
word-wrap: break-word">
 
Possible Title 1: Data in WormBase and how to query it
 
Possible Title 2: WormBase 2019 - Data, Tools and Community Curation
 
This workshop will be an interactive session with users in order to discuss the types of data in WormBase and how to query them using specific tools.  We will discuss recent changes to WormBase community annotation forms and how to use them to contribute data to WormBase.  We will also present updates to ParaSite, a portal to parasitic worm genomic data, and how to find cross-species data at the Alliance of Genome Research.
 
  
1:00 pm 
+
=== SObA Graphs ===
Keep your widgets open: a wealth of data on the gene page - Ranjana Kishore
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
(This will be a quick introduction of the gene page for orienting Users before we jump into the tools section)
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
     
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
1:05 pm 
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
Use the right tool for the right data:
 
Get simple lists using SimpleMine - Wen Chen
 
Tools for RNA seq data - Wen Chen
 
Tools for enrichment analysis - Kimberly Van Auken
 
Get batch gene data using the WormBase Ontology Browser - Raymond Lee
 
Get the big picture: visualize annotations using the SOBA tool - Raymond Lee
 
     
 
1.45-2.00 pm 
 
WormBase ParaSite: Exploring lots of genomes - Kevin Howe
 
  
Find cross-species data at the Alliance of Genome Research - Chris Grove
 
  
Be a Community Curator: submit your data to WormBase - Daniela Raciti
+
== September 19, 2019 ==
  
2.00-2.30pm. Open forum for questions
+
=== Strains ===
</pre>
+
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
=== Finalize Protege tutorial time ===
+
=== Alliance literature curation ===
* Best final options:
+
* Working group will be formed soon
** Wed, Jan 16th, 1pm Pacific/4pm Eastern
+
* Will work out general common pipelines for literature curation
** Thurs, Jan 17th, 11am Pacific/2pm Eastern
 
** Thurs, Jan 17th, 1pm Pacific/4pm Eastern
 
* Propose we go with Wed, Jan 16th, 1pm Pacific/4pm Eastern
 
  
=== Automated descriptions ===
+
=== SObA Graph relations ===
* Distinguishing information rich vs. poor genes
+
* Currently only integrating over "is a", "part of" and "regulates"
* Information poor genes can take advantage of information across MODs/species
+
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
* Need more robust QC pipeline; can work on for WormBase, and later apply to Alliance once worked out
 
* Working on expression statements for Alliance genes
 
* Considering rearrangement of description so disease features more prominently
 
  
=== Disease curation ===
+
=== Author First Pass ===
* Working on how to best integrate molecule effects
+
* Putting together paper for AFP
* Considering SVM for disease; current paper flagging pipeline insufficient
+
* Reviewing all user input for paper
** 200 papers as positive training set available
+
* Asking individual curators to check input
* Results section are not being extracted in latest Textpresso (paper sectioning in general not happening)
 

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input