Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(404 intermediate revisions by 9 users not shown)
Line 16: Line 16:
  
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
  
Line 21: Line 23:
  
  
= 2018 Meetings =
+
= 2019 Meetings =
 
 
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_March_2018|March]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_April_2018|April]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_May_2018|May]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_June_2018|June]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_July_2018|July]]
 
 
 
 
 
== August 2, 2018 ==
 
 
 
=== AFP ===
 
 
 
* The AFP pipeline is currently emailing authors from karen's e-mail address
 
* Use same e-mail account Chris is using for phenotype community curation requests or create a new account for AFP (gmail)
 
* Can use outreach@wormbase.org for consistency
 
* May use the PMID in the subject line so e-mails will not be all in the same thread
 
* Todd and Chris have email credentials
 
** Chris will send to Valerio, Juancarlos, Daniela, and Kimberly
 
* Let Valerio and Juancarlos know what pipelines use AFP before they modify
 
* Do curators still want to receive emails when authors flag their data type?
 
** We will leave the alert emails as is for now
 
 
 
 
 
== August 9, 2018 ==
 
  
=== AFP ===
+
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
* Mei Zhen, SAB member suggested that we include disease models in the AFP form.
 
* The AFP group will work with Ranjana to incorporate it. Ranjana will prepare a mock by next week.
 
* We will then decide about using the existing afp_humdis tables or creating new ones.
 
  
=== Tazendra ===
+
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
  
* Shall we move tazendra.caltech.edu to the cloud? Either WormBase cloud or Caltech cloud?
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
  
 +
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
== August 16, 2018 ==
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
  
=== Tazendra ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
* Moving to cloud? To avoid local hardware issues?
 
* Need to discuss with Juancarlos and Paul S.
 
* Need to consider logistics; put all of Tazendra functionality on cloud? Keep some things local?
 
** Postgres in cloud; forms local? Paper pipeline?
 
** Will consult with Textpresso
 
  
=== ICBO 2018 recap ===
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
* POTATO workshop (Phenotype Ontologies Traversing All The Organisms)
 
** Will work towards generating standardized logical definitions using Dead Simple OWL Design Patterns (DOSDP)
 
*** <Quality> and inheres_in some <Entity> (and has_modifier some <Mod>)
 
*** Exercise: Reconciling logical definitions for apparently equivalent phenotype terms across ontologies (e.g. MP vs. HP)
 
** Can use Protege to edit the OWL ontology and ROBOT for automating generation of many terms and logical definitions in parallel
 
** Will try to align WPO to UPheno as best as we can; will depend (at least in part) heavily on alignment with Uberon for anatomy
 
** Some Uberon alignment challenges: e.g. Fruit fly "tibia" and human "tibia"; human "tibia" parent is "bone" but fly "tibia" is not a bone
 
** Will participate in Phenotype Ontology Developer's call, every 2 weeks on Tuesdays (9am Pacific, 12pm East coast, 5pm UK)
 
*** Next meeting September 4, 2018
 
** Crash course in Protege, ROBOT, Ontology Development Kit, using GitHub to help develop OWL ontologies
 
** PATO needs work
 
** Questions that arose:
 
*** What should the scope of an ontology term be? Context? Life stage? Conditions? Treatment?
 
*** Being weary of ontology term count explosion; what's the right balance?
 
*** When defining phenotype terms, should the cause be included or only the observation? Maybe causes as a subclass (and assuming the observation includes assessment of cause)
 
** Some distinction between human phenotype terms and model organism terms: phenotype of individual vs. population
 
* Xenbase is trying to develop a phenotype ontology (spoke with Troy Pell, developer)
 
** Asked about WPO and how we curate
 
* Lots of plant talks
 
* Many talks on performing quality checks on ontology development and ontology re-use
 
* Domain Informational Vocabulary Extraction (DIVE) tool
 
** Entity recognition/extraction
 
** Working with two plant journals
 
** Tries to identify co-occurrence patterns of words
 
** Web interface and curation tool
 
* Semantic similarity tools and evaluation of them
 
  
=== WormBase Phenotype Ontology working group ===
 
* Chris will send around Doodle poll
 
* Goal is to discuss creation of logical definitions and alignment of phenotypes for Alliance
 
  
 +
== September 12, 2019 ==
  
== August 23, 2018 ==
+
=== Update on SVM pipeline ===
 +
* New SVM pipeline: more analysis and more parameter tuning
 +
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
 +
* For example shown, "dumb" machine starts out with precision above 0.6
 +
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== Alliance tables ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* Filtering/sorting priorities
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* Open question about which tables on the Alliance website should be prioritized for acquiring sorting and filtering functionality
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
=== Worm Phenotype Ontology working group ===
+
=== Citace upload ===
* Gary S., Karen, Kimberly, and Chris have responded to [https://doodle.com/poll/xzkxet8sb57enver#table Doodle poll]
+
** Tuesday, Sep 24th
* Looks like 12pm Pacific (3pm Eastern) on Thursdays is the time that works for everyone
 
** May start late on days when WB CIT meeting goes past 12pm Pacific
 
** May want to start a bit past 12pm to allow west coasters to get lunch, etc.?
 
* Goals:
 
** Work on logical definitions for WPO terms
 
** Consider any restructuring of WPO that would facilitate ontology alignment with other MODs and UPheno
 
** Could we eventually create a phenotype annotation tool (and term requester) that allows modular expressions of a phenotype observation to lookup existing terms or create new terms with logical definitions based on those modular elements?
 
  
=== Alliance anatomy ===
+
=== Strain to ID mapping ===
* Data quartermasters and expression working group are looking to get updated anatomy-Uberon mappings
+
* Waiting on Hinxton to send strain ID mapping file?
* How frequent are data updates at the Alliance? Seems to be every ~2 months
+
* Hopefully we can all get that well before the upload deadline
* Anatomy-Uberon mappings will affect phenotype ontology alignments
+
* Will do global replacement at time of citace upload (at least for now)
  
=== Automated Gene Descriptions ===
+
=== New name server ===
*Ranjana and Valerio working to finish the new pipeline for automated descriptions for WB, will aim to finish them for the next upload, WS268.
+
* When will this officially go live?
* Working on one of the last data types--tissue expression; will use the anatomy ontology to perform logical trimming for the annotation set of cell/anatomy types (for each gene) including neurons (as opposed to using a file for neuronal term groupings taken from Oliver Hobert paper in the old pipeline)
+
* Will we now be able to request strain IDs through the server? Yes
* Currently playing around with thresholds to see how the sentences look, will feedback any ontology related issues to Raymond
 
* Working on incorporating feedback from users for information-poor genes (defined as genes with no human orthology and no GO annotations). Will include other types of information suggested by Users such as human ortholog function and protein domains, etc.
 
* When no other data is available, will include expression cluster data.  Users have complained that they don't find this data useful as it's non-specific and from large scale studies, so will give it the lowest priority for inclusion.
 
*Suggestion to exclude the writing and storing of the thousands of automated descriptions to the Postgres database; there is really no advantage in them being in Postgres. 
 
* At the time of generation of the automated descriptions the related .ace files can also be generated; though will need to include the 6000+ manual descriptions that live in Postgres.  So will need to rethink this part a bit, though skipping Postgres will reduce the number of manual steps in the pipleline and Postgres will have less data that needs to be uploaded and downloaded from future cloud storage.
 
  
== August 30, 2018 ==
+
=== SObA Graphs ===
 +
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
 +
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 +
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 +
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
=== EPIC dataset in the Alliance import ===
 
  
* the EPIC dataset has fine-grained anatomy-life stage annotations (e.g. single cell per minute)
+
== September 19, 2019 ==
* This generates thousands of annotations per gene (up to 30,000) for the 127 genes analyzed in the study
 
* How to deal with this. In WB we do not display anatomy/life stage pairings but we display one list for anatomy terms and one for life stage. Also, we display the EPIC study in a separate panel on the gene page so that it does not ‘dilute’ small scale annotations (concerned raised by Oliver H at the time of the import).
 
  
https://www.wormbase.org/species/c_elegans/gene/WBGene00015143#1--10
+
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
*Possible solutions:
+
=== Alliance literature curation ===
** 1. throw away the pairing information. e.g. for WBGene00020093, there are paired 10713 annotations from expression pattern Expr10421. On WormBase, we have a panel for Expr10421 (on the page for WBGene00020093) that shows 413 life-stage associations and 112 anatomy associations, with no pairing information. 
+
* Working group will be formed soon
***This approach will still give big tables (annotation in the hundreds) for the analyzed genes and the dilution problem will still be there.
+
* Will work out general common pipelines for literature curation
*** This can be implemented for 2.1 as changing the code for 2.0 is fairly involved -Kevin to do upload tomorrow
 
** 2. assign a high-level life stage term (embryo) to the EPIC expression patterns for the alliance import so they will be discoverable on the Alliance website and will be hyperlinked to the WormBase detailed records
 
  
=== Update on AFP Form ===
+
=== SObA Graph relations ===
 +
* Currently only integrating over "is a", "part of" and "regulates"
 +
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
  
*Daniela, Juancarlos, Kimberly, and Valerio have been working on the new iteration of the AFP form; they'll present the updates and we can discuss remaining development
+
=== Author First Pass ===
*Currently have six test papers with varying amounts and types of information that we're using to test the functionality and performance
+
* Putting together paper for AFP
*For entity recognition, we're not able to restrict by paper section, so this presents a challenge
+
* Reviewing all user input for paper
**We are implementing a threshold value, but the actual value will likely need to be entity-specific and we will probably err on the side of precision rather than recall
+
* Asking individual curators to check input

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input