Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(470 intermediate revisions by 9 users not shown)
Line 16: Line 16:
  
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
  
Line 21: Line 23:
  
  
= 2018 Meetings =
+
= 2019 Meetings =
 +
 
 +
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
  
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
+
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
  
[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
  
[[WormBase-Caltech_Weekly_Calls_March_2018|March]]
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
  
[[WormBase-Caltech_Weekly_Calls_April_2018|April]]
+
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
[[WormBase-Caltech_Weekly_Calls_May_2018|May]]
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
  
[[WormBase-Caltech_Weekly_Calls_June_2018|June]]
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
  
 +
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
== July 5, 2018 ==
 
  
=== Community Curation Mass Email update ===
+
== September 12, 2019 ==
* Emails (1 per paper) sent (most batches 75 at a time): 3,227 total
 
** June 11: 5
 
** June 20: 15
 
** June 21: 82
 
** June 22: 6
 
** June 25: 75
 
** June 26: 80
 
** June 28: 363
 
** June 29: 600
 
** June 30: 300
 
** July 1: 450
 
** July 2: 1,251
 
** Email success rate:
 
*** 482 emails bounced (15%)
 
**** 74 bounced emails had a backup email (2%)
 
**** 292 bounced emails had no backup (12%)
 
**** 88% of emails were successfully sent
 
*** 52 out of office replies (2%)
 
* Total community annotations received since June 11: 718 (115 papers)
 
** June 13: 24 (3 papers)
 
** June 14: 8 (3 papers)
 
** June 17: 1 (1 paper)
 
** June 20: 13 (3 papers)
 
** June 21: 13 (4 papers)
 
** June 25: 13 (4 papers)
 
** June 26: 20 (7 papers)
 
** June 28: 17 (8 papers)
 
** June 29: 131 (8 papers)
 
** June 30: 4 (2 papers)
 
** July 1: 24 (5 papers)
 
** July 2: 184 (36 papers)
 
** July 3: 179 (20 papers)
 
** July 4: 27 (9 papers)
 
** July 5: 62 (pre-meeting) (4 papers)
 
* We may want to give people the option to "opt-out" either for a paper or for all emails or both
 
* May be good to ask SAB and users at meetings how they feel about these email requests
 
* Is it worthwhile to send requests about same paper to same people after a wait period?
 
  
=== SAB Literature Curation overview ===
+
=== Update on SVM pipeline ===
* Outline: https://docs.google.com/document/d/1OUHoFYC3deEmZvgzmOtmxpt5iBrVio9sAZG66kRHwsE/edit?usp=sharing
+
* New SVM pipeline: more analysis and more parameter tuning
* Want to ask about:
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
** Data type history, current types, priorities going forward?
+
* For example shown, "dumb" machine starts out with precision above 0.6
** Community curation? Pilot results, should we expand?
+
* G-value (Michael's invention); does not depend on distribution of sets
** Data type vs. topic-based curation?
+
* Applied to various data types
** Val Wood and PomBase have found topic-based curation to be more efficient
+
* Analysis: 10-fold cross validation
* Alliance working groups:  
+
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
** data type focused
+
* F-value changes over different p/n values; G-value does not (essentially flat)
** Has affected our curation approach
+
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== WS267 Citace upload ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* July 20th, upload to Hinxton
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* July 17th, local CIT upload
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
=== Model changes ===
+
=== Citace upload ===
* Some minor tag name changes
+
** Tuesday, Sep 24th
* Changes to GO_annotation to accommodate RO terms
 
  
=== Curation Status Form: flagged vs. validated papers ===
+
=== Strain to ID mapping ===
* Papers validated by curators are not necessarily considered flagged
+
* Waiting on Hinxton to send strain ID mapping file?
* Author first pass forms: prepopulated with SVM, string-matches
+
* Hopefully we can all get that well before the upload deadline
* Curation First Pass: are forms still used?
+
* Will do global replacement at time of citace upload (at least for now)
* TFP (Textpresso first pass) not currently used?
 
  
 +
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
== July 19, 2018 ==
+
=== SObA Graphs ===
 +
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
 +
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 +
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 +
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
=== PomBase SObA ===
 
* Val Wood would be interested in using SObA for PomBase
 
* Some immediate action items to deal with for expanding SObA to other ontologies and databases
 
* Maybe highlight leaf nodes with more granular information? Already available in unweighted view
 
  
=== Outreach ===
+
== September 19, 2019 ==
* SAB suggested continuing site visits for WormBase tutorials
 
* SAB suggested host institution provide funds; we can at least ask
 
  
== July 26th, 2018 ==
+
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
=== Temporarily Withdrawn Paper (WBPaper00054672) ===
+
=== Alliance literature curation ===
*A recently epublished Genetics paper (2018 Jun 26)  "...has been temporarily removed at the authors' request, to allow review of a companion article. The article 301078 will appear in a future issue of GENETICS."
+
* Working group will be formed soon
*We already have the PDF in postgres (as of 2018-07-10), but I don't see the paper in TPC or the curation status form.
+
* Will work out general common pipelines for literature curation
*Is there anything we should do here? Temporarily remove the PDF from postgres?
 
  
=== Expression display on the anatomy page ===
+
=== SObA Graph relations ===
* Suggested last week to have a way to display genes and associated evidences on the anatomy page. Rank order by number of evidences.
+
* Currently only integrating over "is a", "part of" and "regulates"
* Mock here: https://docs.google.com/presentation/d/1HE_QhJrs5oScmORdiAAdTxfavAnF6NUQhtJGWvcC7EQ/edit#slide=id.p2
+
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
  
=== SAB Follow Up ===
+
=== Author First Pass ===
*SAB Report (Paul sent around a Word doc)
+
* Putting together paper for AFP
 +
* Reviewing all user input for paper
 +
* Asking individual curators to check input

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input