Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(343 intermediate revisions by 9 users not shown)
Line 16: Line 16:
  
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
  
Line 21: Line 23:
  
  
= 2018 Meetings =
+
= 2019 Meetings =
 
 
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_March_2018|March]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_April_2018|April]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_May_2018|May]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_June_2018|June]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_July_2018|July]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_August_2018|August]]
 
  
[[WormBase-Caltech_Weekly_Calls_September_2018|September]]
+
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
  
 +
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
  
== October 4, 2018 ==
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
  
=== SimpleMine ===
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
* Automated descriptions will be removed from Postgres/OA
 
* SimpleMine needed to update where it pulls the automated descriptions from
 
* Will add expression cluster and automated description columns output (in addition to concise description)
 
* Added RNAseq FPKM download function for 9 species: http://mangolassi.caltech.edu/~azurebrd/cgi-bin/forms/fpkmmine.cgi
 
* Added SimpleMine-like topic search: http://tazendra.caltech.edu/%7Eazurebrd/cgi-bin/forms/spellmine.cgi
 
* Should put the new tools under the WormBase Tools menu
 
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
== October 11, 2018 ==
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
  
=== Ready for new round of phenotype requests ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
* Some users are getting confused about the name & email prepopulation based on IP address
 
** May want to stop autopopulating name and email or autopopulate based on email recipient only (encode in URL sent in email)
 
** Could we use cookies? Possibly, but may only help if a computer is shared but the browser isn't
 
* Current autocomplete expects exact match to person primary name; e.g. "Scott Emmons" will not match the official name "Scott Wilson Emmons"
 
** Maybe we could improve search matching; algorithm from Cecilia/Juancarlos? Elastic search by Valerio?
 
** Can we capture incomplete sessions? We may be able to learn from them. May be flooded by robot visits? Is it worth going through all the logs/sessions? Info is there if we want to look at it.
 
* Will go ahead and send emails for only new set of papers (won't resend requests for papers that had emails sent in June/July)
 
* Maybe go back to papers that already had a request sent at the 6 month time point
 
* Include other papers in need of curation at bottom of email; possibly, would it turn off users?
 
  
=== Worm Phenotype Ontology ===
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
* WPO has a new home on GitHub
 
** https://github.com/obophenotype/c-elegans-phenotype-ontology
 
* Edits should only be made to the edit file
 
** https://github.com/obophenotype/c-elegans-phenotype-ontology/blob/master/src/ontology/wbphenotype-edit.owl
 
* Anyone interested in contributing to the WPO should contact Chris for update pipeline info
 
* Need to make sure that all users of the WPO have the updated link information
 
  
=== Provide provenance in query tools ===
 
* Prompted by user question/request
 
* Specifically in WOBr, Anatomy pages
 
* WOBr provides genes annotated to term; should provide provenance of each gene and its annotations
 
* Expression pattern and expression cluster gene lists (in context of Anatomy WOBr); want to provide provenance for this data
 
* Provenance = an object ID, like "Expr1234" or "WBPaper00032062:age_regulated_genes" with link to relevant page
 
  
=== WOBr disease associations ===
+
== September 12, 2019 ==
* Ranjana wondering if WOBr is using updated disease-gene associations
 
* Gene association file (for disease) being generated by script; likely need to update where the data is coming from
 
* Ranjana will discuss with Raymond and Kevin
 
  
=== New WormMine superuser ===
+
=== Update on SVM pipeline ===
* Now all template queries are owned by a new superuser
+
* New SVM pipeline: more analysis and more parameter tuning
* If people are interested in adding or editing templates talk to Chris or Paulo for superuser access
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
* We were running into login, template ownership, and consistency issues
+
* For example shown, "dumb" machine starts out with precision above 0.6
 +
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== Next upload Nov 16 ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* WS269 citace upload Tuesday, November 13
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* Can we add upload dates to Google calendar for WormBase?
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
 +
=== Citace upload ===
 +
** Tuesday, Sep 24th
  
== October 18, 2018 ==
+
=== Strain to ID mapping ===
 +
* Waiting on Hinxton to send strain ID mapping file?
 +
* Hopefully we can all get that well before the upload deadline
 +
* Will do global replacement at time of citace upload (at least for now)
  
=== Upload for WS269===
+
=== New name server ===
* To Hinxton Nov 16
+
* When will this officially go live?
* Citace upload to Wen Nov 13
+
* Will we now be able to request strain IDs through the server? Yes
  
=== Data provenance in WOBr tools ===
+
=== SObA Graphs ===
* Juancarlos and Raymond have been working on
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
* Awaiting pull request
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
* Can test: juancarlos.wormbase.org
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
** Go to WOBr and test anatomy ontology
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
** Now WOBr gene count results show data objects from which the associations come (Expr_pattern and Expression_cluster objects)
 
  
=== New SPELL server ===
 
* New server on Amazon (modifying server SGD uses)
 
* Raymond, Wen, and Todd working on
 
* Currently only have an SGD mirror running
 
* Wen will swap the data later today
 
* WormBase header link (to WormBase) or only link to Alliance site? We want a unified site for Alliance
 
* Each MOD would still support their own server for their data (MOD-specific grants support each server, for now)
 
  
=== Alliance expression data ===
+
== September 19, 2019 ==
* Anatomy-LifeStage pair required for Alliance expression annotations
 
* Since many expression pattern annotations don't have both, the missing entity would default to ontology root term
 
* Need to link anatomy root term to Uberon for ribbon display; root term annotations fall under the "Other" category in the ribbon slim
 
* Create "Anatomical_part" term to serve as the default "Other"/root term?
 
* All life-stage-only annotations will fall into anatomy "Other" and flood the list; should these be filtered out?
 
  
 +
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
== October 25, 2018 ==
+
=== Alliance literature curation ===
 +
* Working group will be formed soon
 +
* Will work out general common pipelines for literature curation
  
=== WormBase SPELL on Amazon Web Service ===
+
=== SObA Graph relations ===
* http://34.224.93.60/ is running WormBase SPELL on WS267, based on the SPELL code supported by SGD. It is more stable and faster than the current Caltech server
+
* Currently only integrating over "is a", "part of" and "regulates"
* Waiting for Todd to respond if we can use this site as the official server for WormBase SPELL.
+
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
* Also need to get an instance (preferably also from WormBase AWS as a development site of WormBase SPELL.
 
* Caltech will focus on generating new data instead of SPELL code development.
 
  
=== Linking annotation evidence to Anatomy Ontology Browser ===
+
=== Author First Pass ===
Each gene expression annotation shown on WOBr is linked to the object so that users can more easily examine the evidence.
+
* Putting together paper for AFP
[https://juancarlos.wormbase.org/tools/ontology_browser/show_genes?focusTermName=neuron&focusTermId=WBbt:0003681 Example]
+
* Reviewing all user input for paper
 +
* Asking individual curators to check input

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input