Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(245 intermediate revisions by 8 users not shown)
Line 21: Line 21:
  
 
GoToMeeting link: https://www.gotomeet.me/wormbase1
 
GoToMeeting link: https://www.gotomeet.me/wormbase1
 
  
  
 
= 2019 Meetings =
 
= 2019 Meetings =
  
== January 3, 2019 ==
+
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
 
 
=== WS270 Citace upload ===
 
* Next Tuesday, Jan 8th, 10am Pacific
 
 
 
=== Gene descriptions ===
 
* Valerio generated new files to ignore/filter-out problematic genes
 
* Still need to validate new pipeline
 
* Barring any major issues, will submit new files for WS270 (can load old files if needed)
 
* Maybe should define a test set (random sample) to test each release? Already have a test set
 
 
 
=== Protege Tutorial ===
 
* Doodle poll open: https://doodle.com/poll/kn49rd3rggymn68g
 
* Please fill out poll if you are interested in attending; have responses from Kimberly and Gary S.
 
 
 
 
 
==January 10th, 2019==
 
 
 
===WB workshop at IWM 2019===
 
Here's a draft, need to finalize as Jan 15th is the deadline
 
<pre style="white-space: pre-wrap;
 
white-space: -moz-pre-wrap;
 
white-space: -pre-wrap;
 
white-space: -o-pre-wrap;
 
word-wrap: break-word">
 
  
Title: WormBase 2019 - Data, Tools and Community Curation
+
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
This workshop will be an interactive session with users in order to discuss the types of data in WormBase and how to query them using specific tools.  We will discuss recent changes to WormBase community annotation forms and how to use them to contribute data to WormBase.  We will also present updates to ParaSite, a portal for parasitic worm genomic data, and guide participants on how to find data across model organisms at the Alliance of Genome Research.
 
  
Format: 90 minutes: 1 section of 40 minutes, followed by a second section of 20 mins and a third section which will be a 30 minute open discussion/Q&A session.  Talks in each section will also be tailored to allow time for questions from the audience.
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
  
Section 1: Introduction to the WormBase gene page and tools such as SimpleMine, Tools for RNA seq data and enrichment analysis, gene-related data using WormBase Ontology Browser and Annotation Visualization tools.
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
  
Section 2: WormBase Parasite database, Model Organism data at the Alliance of Genome Research and Community Curation forms
+
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
Section 3: Open forum for discussion and Q&A.
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
</pre>
 
  
=== Finalize Protege tutorial time ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
* Best final options:
 
** Wed, Jan 16th, 1pm Pacific/4pm Eastern
 
** Thurs, Jan 17th, 11am Pacific/2pm Eastern
 
** Thurs, Jan 17th, 1pm Pacific/4pm Eastern
 
* Propose we go with Wed, Jan 16th, 1pm Pacific/4pm Eastern
 
  
=== Automated descriptions ===
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
* Distinguishing information rich vs. poor genes
 
* Information poor genes can take advantage of information across MODs/species
 
* Need more robust QC pipeline; can work on for WormBase, and later apply to Alliance once worked out
 
* Working on expression statements for Alliance genes
 
* Considering rearrangement of description so disease features more prominently
 
  
=== Disease curation ===
 
* Disease model curation progressing; Lots of discussions about data standards and entities in Alliance Disease Working Group
 
* Considering SVM for disease; current paper flagging pipeline is rather broad
 
*200+ papers as positive training set available
 
* Results section are not being extracted in latest Textpresso (paper sectioning in general not happening)
 
  
=== Noctua / GO-CAM ===
+
== September 12, 2019 ==
* Making progress on best practices
 
* Can use Noctua to generate GO annotations
 
* Starting to incorporate proteins
 
* Working with an ever changing Noctua platform; bugs emerge as it is developed; may benefit from frozen release of the software
 
* Next month or two, will import entire set of C. elegans GO annotations into Noctua
 
** Many decisions to make: how to model?
 
** Each gene will become a single Noctua model; not linked to each other initially
 
** Working on batch updates/uploads to Noctua
 
  
=== Expression cluster curation ===
+
=== Update on SVM pipeline ===
* Wen working on 40 paper backlog; hoping to finish by WS271
+
* New SVM pipeline: more analysis and more parameter tuning
* Wen wants to work on RNA-Seq tools next
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
** FPKM tools
+
* For example shown, "dumb" machine starts out with precision above 0.6
** Filtering by datasets
+
* G-value (Michael's invention); does not depend on distribution of sets
** Would like tools ready before International C. elegans Meeting (June 2019)
+
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== Neural function curation ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* Raymond: want to use design pattern strategy to curate
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
 +
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
=== WOBr ===
+
=== Citace upload ===
* Now incororating non-IEA disease annotations into WOBr
+
** Tuesday, Sep 24th
* Using disease-association file
 
  
=== Phenotype curation ===
+
=== Strain to ID mapping ===
* Will run a new round of phenotype requests on ~3,000 papers in next few weeks (last one ran in October)
+
* Waiting on Hinxton to send strain ID mapping file?
* Processing community curation submissions
+
* Hopefully we can all get that well before the upload deadline
* Will recurate some community curation papers to check:
+
* Will do global replacement at time of citace upload (at least for now)
** 1. completeness of community curation
 
** 2. the time-savings of the phenotype form pipeline
 
* Have made recent improvements to phenotype request emails, allowing authors more feedback options which are now being readily used
 
* Working with new phenotype ontology GitHub repository
 
** OBO Foundry now pointing phenotype ontology at the GitHub repository (both OBO and OWL files)
 
** Need to update the citace upload procedure to generate phenotype .ACE file; currently the script is still running on the old OBO Tazendra location; need to update to work off new OBO file at GitHub
 
  
=== Metabolomics ===
+
=== New name server ===
* Karen working with Michael Witting to pull in metabolomics data
+
* When will this officially go live?
* Integrating information about endogenous concentrations of metabolites
+
* Will we now be able to request strain IDs through the server? Yes
  
=== Automated descriptions React tool ===
+
=== SObA Graphs ===
* Juancarlos developed tool to request versions of the automated descriptions
+
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
* Will update pipeline to pull data from Alliance; currently coming from Tazendra
+
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
* Tracking how the descriptions are changing, by data module for example
+
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
* React tool currently on mangolassi but will move to Alliance at a location of Olin's choosing (AWS resource)
+
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
=== Transgenes in the Alliance ===
 
* Are transgenes being discussed at the Alliance?
 
* Yes, the phenotype and disease working group has been discussing
 
* Hasn't come up in recent weeks, but was discussed at face-to-face meeting
 
* One significant issue is that WormBase uniquely has extra-chromosomal arrays, whereas other MODS (always?) have integrated transgenes and consider them types of alleles
 
* Chris will give Karen a heads up next time the issue is intended to be discussed within the Alliance
 
  
 +
== September 19, 2019 ==
  
==January 17th, 2019==
+
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
=== Alliance Grant ===
+
=== Alliance literature curation ===
* Review grant and see if anything important is missing or if there are any needed edits
+
* Working group will be formed soon
* Tight on space but feel free to add a sentence here or there
+
* Will work out general common pipelines for literature curation
* Doc: https://docs.google.com/document/d/1HtTBnQYISfrMjnfFKEDaSjSazlyVBkvacOA8VWo8INY/edit?usp=sharing
 
  
 
+
=== SObA Graph relations ===
==January 24th, 2019==
+
* Currently only integrating over "is a", "part of" and "regulates"
 +
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
  
 
=== Author First Pass ===
 
=== Author First Pass ===
*For strain identification, we are using the obo_name_strain table.
+
* Putting together paper for AFP
*There is an entry for 'Strain' in that table that leads to false positives.
+
* Reviewing all user input for paper
*Is this entry needed for curation?
+
* Asking individual curators to check input
*If so, we will just filter it out for the purposes of AFP.
 
* In Tazendra with timestamp Jan 23, 2019; on Mangolassi with timestamp Nov 15, 2018
 
* In WS269 with timestamp '2018-09-25_17:00:39_pad'
 
* Linked to paper WBPaper00055300; Location 'PS'; species C. elegans
 
 
 
=== Specifically expressed genes ===
 
* On anatomy pages, in the Ontology Browser widget, we have a list of genes in a box that says "There are ### genes that may be specifically expressed."
 
* These genes are genes that (1) are shown by expression pattern (Expr_pattern) objects to only be expressed in that tissue/cell or subtype but not in any other AND (2) genes that are shown to be enriched in that tissue/cell or subtype by expression cluster data BUT may include genes that are shown to be expressed (to some degree) via expression cluster in other tissues, albeit at low levels
 
* Wording is currently a bit misleading; should the statement/wording change or should we change the algorithm?
 
* Could offer a specifically expressed list and an enriched list separately
 
* Warrants more discussion
 
 
 
 
 
== January 31st, 2019 ==
 
 
 
=== Specifically expressed genes ===
 

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input