Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
m
Line 1: Line 1:
 +
= Previous Years =
 +
 
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2009|2009 Meetings]]
  
 +
[[WormBase-Caltech_Weekly_Calls_2011|2011 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2012|2012 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2013|2013 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2014|2014 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2015|2015 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
==2011 Meetings==
 
  
[[WormBase-Caltech_Weekly_Calls_February_2011|February]]
+
GoToMeeting link: https://www.gotomeet.me/wormbase1
  
[[WormBase-Caltech_Weekly_Calls_March_2011|March]]
 
  
 +
= 2019 Meetings =
  
 +
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
  
==April 7, 2011==
+
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
  
 +
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
  
Transgene Model
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
*On Wiki
 
*Sent out to people
 
*Have a look; report any concerns
 
*Can follow on BitBucket; search for transgene; link to Wiki
 
*No objections at Caltech; Karen will send to Paul Davis
 
*Changes to ACE dumping script; Karen will talk to Juancarlos
 
*Changes needed in OA (softer deadline than dump)
 
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
Interactions
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
*Murky genetic interaction curation?
 
*Err on the side of generality/trusting author statements
 
*When in doubt, curate as "genetic interaction"
 
*Chris is working on decision tree/pipeline for curation
 
*Kimberly working on Physical Interaction model
 
  
 +
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
BioGRID meeting at Princeton in May
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
*Call in
 
*What will Rose propose?
 
  
  
Expression Pattern Curation (Daniela/Wen)
+
== September 12, 2019 ==
*Daniela sent out picture page for review
 
*Expr Pattern OA wiki is in place:
 
**http://wiki.wormbase.org/index.php/Expression_Pattern
 
*As soon as Juancarlos is done with the modularization will start working on the code.
 
*In the meanwhile Daniela will curate expression pattern writing .ace files
 
*Expr_pattern OA should be ready by the next upload (May26th). (I really doubt this, parsing in data, writing dumpers, and checking it take a long time.  Picture and Interaction each probably took longer than 2 months, and we're not starting Expr until May at the earliest -- Juancarlos)
 
  
 +
=== Update on SVM pipeline ===
 +
* New SVM pipeline: more analysis and more parameter tuning
 +
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
 +
* For example shown, "dumb" machine starts out with precision above 0.6
 +
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
Patch file/Interbuild (Raymond)
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
*Developed good patch file
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
*Tested patch file to update WS224 to WS225 - seems OK
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
*Less than 5 minutes for upload
+
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
*Testing now should be done by Todd/OICR team
+
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
 +
=== Citace upload ===
 +
** Tuesday, Sep 24th
  
Uma started
+
=== Strain to ID mapping ===
*Working on concise descriptions of gene classes
+
* Waiting on Hinxton to send strain ID mapping file?
*Karen has reviewed with Uma; Uma is reading papers
+
* Hopefully we can all get that well before the upload deadline
*Discussing details of descriptions
+
* Will do global replacement at time of citace upload (at least for now)
*Inconsistencies/discrepancies of gene class names
 
*>2400 gene classes
 
*Can work on generating formula for this curation
 
*Arun can help with automation
 
*May need to get Uma an interface to enter data into postgres
 
*Adapt concise description CGI for her? (probably write a whole new interface depending on goal -- Juancarlos)
 
*Gene class name and a text field
 
*Using Textpresso/WormMart output; sentence saver?
 
  
 +
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
eggNOG data into citace?
+
=== SObA Graphs ===
*Who's going to handle the data? curate?
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
*Michael? OK
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 +
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 +
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
  
 +
== September 19, 2019 ==
  
==April 14, 2011==
+
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
-Gene Class Descriptions-
+
=== Alliance literature curation ===
*Concerns about maintenance and redundancy
+
* Working group will be formed soon
*Uma here for ~ 3 months
+
* Will work out general common pipelines for literature curation
*How many gene classes have alleles?
 
*How many are named by phenotype rather than just molecular data?
 
*How is this different from gene concise descriptions?
 
*Should it be a summary of all gene concise descriptions of the class?
 
*Things currently focused on:
 
**using WormMart to look at genes in a class
 
**pulls out all concise descriptions
 
**look at similarities
 
**interesting things to highlight
 
*Gene concise descriptions vs class descriptions
 
**Gene-centric vs Class-centric
 
**Consolidating/pooling all concise descriptions from individual genes?
 
*Going for maintenance-free statements
 
*Potentially building an interface
 
*Richard Durbin: development vs behavior?
 
*Prioritization?
 
*Focus on phenotype-based classes like UNC?
 
*Factors for prioritization:
 
**Numbers of genes curated
 
**molecular vs phenotype-based
 
**Amount of info currently available?
 
**Historical points
 
**Most actively worked currently? (most mentioned in last year's publications?)
 
*Uma and Karen could communicate with Kimberly and Ranjana about
 
*What is most efficient for Uma to focus on?
 
*Uma can look at gene class description makes sense
 
*Skip gene classes for which only one gene exists
 
*GO term stats on each class?
 
  
 +
=== SObA Graph relations ===
 +
* Currently only integrating over "is a", "part of" and "regulates"
 +
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
  
-Papers missing from Textpresso-
+
=== Author First Pass ===
*Issue: Genetics papers for GSA markup are missing from SVM analysis
+
* Putting together paper for AFP
*Juancarlos' file on caprica
+
* Reviewing all user input for paper
*Discrepancy between papers on Textpresso and those gone through SVM
+
* Asking individual curators to check input
*SVM doesn't pick up GSA papers
 
*Generate a filtering to detect which ones have been missed by SVM
 
*Michael looking into reasons why the pipeline isn't working
 
*Tazendra vs Textpresso discrepancies?
 
*Ruihua will process 56 missing papers retroactively
 
*Still working on how to avoid this in the future
 

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input