Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(157 intermediate revisions by 7 users not shown)
Line 31: Line 31:
 
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
 
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
  
== April 4, 2019 ==
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
  
=== Paul's Biocuration Keynote ===
+
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
* At Biocuration keynote, Paul S. will talk about web forms, including Author First Pass form
 
* Can someone send Paul the latest AFP form location and documentation
 
* Will also discuss the future of micropublications
 
  
=== IWM Swag ===
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
* Stickers vs. screen cloths
 
* Daniela had contacted Jessie to make new graphic; she's interested, but do we still need a new graphic?
 
* Plan was to make ~2,000 screen cloths (4x4 inch or 6x6 inch?)
 
** As for design would be nice to have new design
 
** Gary had made cartoon for community curation (proposal might be offensive to some)
 
* Include water bottles for special contributors? In addition to shot glasses?
 
* Turn around time? ~ 1 month; designs done in April; submit request by mid-May at latest
 
* Worm in apple graphic? Dragon worm cartooon? Maybe some variation on it?
 
** We can ask Jessie if she can draw up a couple quick sketch ideas
 
  
=== Collaborators tool ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
* Juancarlos worked on a tool to extract WBPerson collaborators
 
* WB staff generally had ~100 collaborators, Paul S has 670
 
* Distinguish between collaborators and coauthors? Manual (self reported) vs. automatic associations?
 
* Make modification to ACEDB model?
 
* Maybe 'collaborators' should be restricted to the laboratory level?
 
  
 +
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
  
== April 11, 2019 ==
 
  
=== Canadian C. elegans meeting ===
+
== September 12, 2019 ==
the "Canadian C. elegans meeting” that will follow the 2019 annual conference of the Canadian Society for Molecular Biosciences. The CSMB conference is held at Université de Montréal on June 2-5, 2019 and will focus on Model systems in cancer research (https://www.fourwav.es/view/1174/info/). The C. elegans conference will be held at the CHUM Research Center in downtown Montréal on Thursday June 6, 2019, and will offer opportunities for both oral and poster presentations. The schedule is not finalized yet but it should start around 8:30am and end around 6pm, followed by an apero in a nearby pub. The costs of registrations will depend on the number of participants but will be between 35 and 50 $ per person (it will cover the costs of lunch, coffee break and logistics).
 
* Karen may be able to go; she will contact Kimberly for any pertinent information
 
  
=== Feedback from small RNA meeting April 3-5, 2019 ===
+
=== Update on SVM pipeline ===
* Future meeting invitations and summer workshop?
+
* New SVM pipeline: more analysis and more parameter tuning
* Dustin Updike - in Maine - interested in running a summer workshop for students
+
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
** Maybe one or two week-long (5-10 days); unconfirmed
+
* For example shown, "dumb" machine starts out with precision above 0.6
** Wen followed up with him: WormBase curator could attend and present on JBrowse, scientific writing, micropublications
+
* G-value (Michael's invention); does not depend on distribution of sets
** Hopefully they can cover travel costs of WB curators
+
* Applied to various data types
** Dustin will get back to us about details as they emerge
+
* Analysis: 10-fold cross validation
* Julie Claycomb interested in having WB curator at Toronto's worm meeting
+
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
=== New SVM approach ===
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
* Michael will send around paper list with new and old results in the same file
+
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
* Raymond looked at his data types; at first glance thinks there may be quite a few false positives and false negatives
+
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
* Can represent new SVM scores as histograms, presumably produce Gaussian curves, can we accept the overlap in distribution tails?
+
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
* Old thresholds were not necessarily gold standards, probably somewhat arbitrarily chosen initially; we can use the old metrics to roughly choose a new set of corresponding thresholds for the new SVM scores
+
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
=== IWM workshop ===
+
=== Citace upload ===
* On Saturday of meeting
+
** Tuesday, Sep 24th
  
=== NIH workshop on Trustworthiness ===
+
=== Strain to ID mapping ===
* Chris attended on Monday and Tuesday
+
* Waiting on Hinxton to send strain ID mapping file?
* Not quite clear that there is or will be any mandate from funders; just encouragement
+
* Hopefully we can all get that well before the upload deadline
* FAIR data and TRUST-worthy repositories
+
* Will do global replacement at time of citace upload (at least for now)
** FAIR = Findable, Accessible, Interoperable, Reusable
 
** TRUST = Transparency, Responsibility, User community, Sustainability, Technology
 
* Chris will review briefly at next site-wide call
 
* Karen: This is something the data commons really wants
 
** Maybe just need to make sure that the Alliance meets these requirements going forward, not so much for WormBase (?)
 
  
 +
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
== April 18, 2019 ==
+
=== SObA Graphs ===
 +
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
 +
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 +
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 +
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
=== IWM swag ===
 
* Stickers, screen cloths?
 
* Jessie's design(s)?
 
* Daniel reaching out to Caltech security who has nice screen cloths
 
* Print list of WormBase tools or user guide on cloth (one side; logo on the other?)?
 
* Will get quote (maybe ≤$1 per cloth?)
 
  
=== Life stage and anatomy ontologies ===
+
== September 19, 2019 ==
* Each ontology now has an ODK GitHub repository (thanks Nico!):
 
** Anatomy ontology: https://github.com/obophenotype/c-elegans-gross-anatomy-ontology
 
** Life stage ontology: https://github.com/obophenotype/c-elegans-development-ontology
 
* We will perform a round of comparisons between the original OBO files and the ODK-generated OBO files
 
* Will follow up with quality control fixes (e.g. duplicate or missing definitions)
 
  
=== Biocurator meeting ===
+
=== Strains ===
* EuroPMC - have access to 80-90% full text of open access papers
+
* Need to wait for new strain IDs from Hinxton before running dumping scripts
** Would be good for WormBase & Textpresso to work with them
+
* Don't edit multi-ontology strain fields in OA for now!
* Lots of groups working together on pathways
+
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
** SwissProt, GO, Reactome
+
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
* SwissLipids
 
** Metabolomics database pilot
 
** https://www.swisslipids.org/#/
 
* RHEA to replace KEGG (which is well curated but not (no longer) open)
 
** https://www.rhea-db.org/home
 
* Author First Pass poster presented (won 3rd place out of ~200 posters!)
 
** Analogous to FlyBase's 'Fast Track Your Paper' (FTYP)
 
** FTYP does not extract entities
 
** Should evolve into an Alliance pipeline
 
* Data visualization talk (Sean O'Donoghue)
 
** Don't use rainbow heatmaps
 
** Oval representation of time course
 
** Valerio spoke to him; D3 is now a standard library for data visualization
 
  
=== GO CAM ===
+
=== Alliance literature curation ===
* Could try to generate automated descriptions using GO CAM models
+
* Working group will be formed soon
* Group is trying to come up with model naming scheme
+
* Will work out general common pipelines for literature curation
  
=== Gene Expression Omnibus (GEO) ===
+
=== SObA Graph relations ===
* GEO data has user submission forms that refer to life stages, anatomy but don't map to WB unique IDs
+
* Currently only integrating over "is a", "part of" and "regulates"
* Maybe we can ask GEO to update their data submission forms to use standard IDs
+
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
* Would help reduce amount of effort required by WB to curate the entries after submission to GEO
 
* GEO forms apply to all species; controlled vocabularies/ontologies could be submitted to GEO for incorporation into their forms
 
  
=== RNA-Seq data processing pipeline (Alaska) ===
+
=== Author First Pass ===
* Raymond, Juancarlos, David Angeles and Joseph developed automated data processing pipeline
+
* Putting together paper for AFP
* http://alaska.caltech.edu/
+
* Reviewing all user input for paper
* Maps reads to the genome; will it return FPKM values for each gene?
+
* Asking individual curators to check input
 
 
 
 
== April 25, 2019 ==
 
 
 
=== WormBase plans for coming year ===
 
* Jae
 
** help develop better usability of data and tools for expression and regulation
 
** Venn diagrams
 
** Get more up to date on protein-DNA and protein-RNA interactions, may distinguish small scale from large scale
 
** GO CAM curation, once there are defined best practices for editing with Noctua
 
* Kimberly
 
** Migration of all manual GO annotations to Noctua as GO CAM models; Jae may want to wait until the migration is done
 
** Will continue to develop GO CAM models and tools to help/expedite GO CAM model development
 
* Ranjana
 
** Gene descriptions: need to bring in pathway data; processes alone are insufficient; will try to use GO CAM models (proof of concept)
 
** Human disease curation: WB disease data into the Alliance
 
*** Improved text-mining approach to identify disease model papers
 
*** Paul S: May want to work on pathways/GO-CAM models for disease genes
 
* Wen
 
** Proteomics curation, tool development
 
*** Work with Hinxton to curate proteomics data at Caltech
 
*** 50 existing datasets
 
*** Common follow up to RNA Seq experiments
 
** Expression clusters - will focus on 'high quality' papers, i.e. useful to community; some have poor metadata and authors don't respond to emails
 
** Outreach - will explore possibility of having local institutes invite WormBase staff to give talks (pay travel costs?); coincide with meetings?
 
*** Northwest worm meeting in Seattle? Chicago worm meeting?
 
*** Bar Harbor tutorial this summer
 

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input