Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
(363 intermediate revisions by 9 users not shown)
Line 16: Line 16:
  
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
 +
 +
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
  
Line 21: Line 23:
  
  
= 2018 Meetings =
+
= 2019 Meetings =
 
 
[[WormBase-Caltech_Weekly_Calls_January_2018|January]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_February_2018|February]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_March_2018|March]]
 
 
 
[[WormBase-Caltech_Weekly_Calls_April_2018|April]]
 
  
[[WormBase-Caltech_Weekly_Calls_May_2018|May]]
+
[[WormBase-Caltech_Weekly_Calls_January_2019|January]]
  
[[WormBase-Caltech_Weekly_Calls_June_2018|June]]
+
[[WormBase-Caltech_Weekly_Calls_February_2019|February]]
  
[[WormBase-Caltech_Weekly_Calls_July_2018|July]]
+
[[WormBase-Caltech_Weekly_Calls_March_2019|March]]
  
[[WormBase-Caltech_Weekly_Calls_August_2018|August]]
+
[[WormBase-Caltech_Weekly_Calls_April_2019|April]]
  
 +
[[WormBase-Caltech_Weekly_Calls_May_2019|May]]
  
== September 6, 2018 ==
+
[[WormBase-Caltech_Weekly_Calls_June_2019|June]]
  
=== Genotype class ===
+
[[WormBase-Caltech_Weekly_Calls_July_2019|July]]
* Chris started initial document to draw up ?Genotype class and make appropriate changes to ?Strain class
 
**  https://docs.google.com/document/d/19hP9r6BpPW3FSAeC_67FNyNq58NGp4eaXBT42Ch3gDE/edit?usp=sharing
 
* Would be good for people to look at so we can discuss next time
 
* Would also be good to have Kevin H take a look and provide feedback
 
  
=== Citace Upload ===
+
[[WormBase-Caltech_Weekly_Calls_August_2019|August]]
* Send citace files to Wen by Sept 18, 10am Pacific
 
  
=== Automated gene descriptions ===
 
* Group making improvements
 
* Added disease, protein domains
 
** When direct experimental evidence for disease relevance, will say "gene has been used to study"
 
* When minimal data (information-poor genes), we can refer to human ortholog and data stored for that human gene in Alliance
 
* Continue to receive feedback from users; include enrichment information, etc.
 
* Now have good trimming algorithms to retain important info without flooding a description with too many granular terms
 
* Will not store automated descriptions in Postgres
 
* Wen will modify SimpleMine scripts to accommodate change
 
* Would be good to write a paper on automated concise descriptions
 
* Came up at GO meeting/hackathon: Translating GO-CAM models into concise descriptions?
 
** Should be doable; just need to develop code when we're ready to do that
 
  
=== Textpresso presentation at next Alliance all-hands call ===
+
== September 12, 2019 ==
* Who should present? Maybe have several people? Kimberly, Valerio, Michael for sections?
 
* We can discuss at next Textpresso meeting
 
* Should cover techniques and software, but keep it generally simple and comprehensible for a larger audience
 
  
 +
=== Update on SVM pipeline ===
 +
* New SVM pipeline: more analysis and more parameter tuning
 +
* avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
 +
* For example shown, "dumb" machine starts out with precision above 0.6
 +
* G-value (Michael's invention); does not depend on distribution of sets
 +
* Applied to various data types
 +
* Analysis: 10-fold cross validation
 +
** Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
 +
* F-value changes over different p/n values; G-value does not (essentially flat)
 +
* Area Under the Curve (AUC): probability that a random positive scores higher than random negative
 +
* AUC values for many WB data types upper 80%'s into 90%'s
 +
* Ranjana: How many papers for a good training set? Michael: we don't know yet
 +
* Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
 +
* If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
 +
* Michael can provide training sets he has used recently
  
== September 13, 2018 ==
+
=== Clarifying definitions of "defective" and "deficient" for phenotypes ===
 +
* WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
 +
* Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
 +
* What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
 +
* Definitions include meanings or words:
 +
** "Variations in the ability"
 +
** "aberrant"
 +
** "defect"
 +
** "defective"
 +
** "defects"
 +
** "deficiency"
 +
** "deficient"
 +
** "disrupted"
 +
** "impaired"
 +
** "incompetent"
 +
** "ineffective"
 +
** "perturbation that disrupts"
 +
** Failure to execute the characteristic response = abnormal?
 +
** abnormal
 +
** abnormality leading to specific outcomes
 +
** fail to exhibit the same taxis behavior = abnormal?
 +
** failure
 +
** failure OR delayed
 +
** failure, slower OR late
 +
** failure/abnormal
 +
** reduced
 +
** slower
  
 
=== Citace upload ===
 
=== Citace upload ===
* Send files to Wen by 10am Tuesday (18th)
+
** Tuesday, Sep 24th
  
=== Genotype class ===
+
=== Strain to ID mapping ===
* [https://docs.google.com/document/d/19hP9r6BpPW3FSAeC_67FNyNq58NGp4eaXBT42Ch3gDE/edit?usp=sharing ?Genotype class proposal]
+
* Waiting on Hinxton to send strain ID mapping file?
* Genotype_name tag: free text summary of the genotype
+
* Hopefully we can all get that well before the upload deadline
** Can this be automatically generated from components? Ideally, yes, but may be difficult
+
* Will do global replacement at time of citace upload (at least for now)
** Otherwise, can be manually written, as we have been doing, but is a bit denormalized and may require maintenance
 
* Genotype_description tag: free-text description of the genotype
 
** No precedent and probably not going to start now, so will remove from model
 
* Genotype_components supertag: to collect genotype component objects and, where necessary, free text
 
** We want to be able to express zygosity for each referenced object, likely requires a #Zygosity hash (and hence a ?Zygosity model)
 
** ?Zygosity model can have three main tags: "Homozygous", "Heterozygous_with_wild_type", or "Heteroallelic_combination_with"
 
*** Heteroallelic_combination_with tag could further specify the type and identity of the object that is in heteroallelic combination with the original object
 
*** Since it is not ideal to store an arbitrary component in the #Zygosity hash, we should probably just state the zygosity as "Heteroallelic_combination" and for display purposes have an automated way to calculate which components affect the same locus/loci (if necessary)
 
  
 +
=== New name server ===
 +
* When will this officially go live?
 +
* Will we now be able to request strain IDs through the server? Yes
  
 +
=== SObA Graphs ===
 +
* New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
 +
* A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
 +
* Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
 +
** Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology
  
== September 20, 2018 ==
 
  
=== Kimberly's talk at Rutgers ===
+
== September 19, 2019 ==
* Kimberly went to worm meeting at Rutgers (10 labs using C. elegans? 6-7 totally worm-centric)
 
* 30-40 attendees (PIs, postdocs, grad students)
 
* Discussed tools and features at WB
 
* Presented Alliance pages and Textpresso
 
* PIs are enthusiastic about WB
 
* Monica Driscoll made a good plug for Textpresso
 
* People requesting FAQs and user guides (text and videos)
 
* Monica suggested a WB tutorial for PIs ;p
 
* Some people surprised about what they can accomplish using the tools available, like SimpleMine
 
* Covered gene set enrichment, WormMine, SPELL, SimpleMine, ParaSite BioMart, Textpresso
 
* Would be good to show people how to use Textpresso Central
 
* Discussed micropublications, asked about negative results (precedent?)
 
* Some asked about WB funding and Alliance plans
 
* Can we make a within-page search available to find, for example, field names etc.
 
* Some challenges in find genes/proteins of certain class
 
** Had question about histone genes recently
 
** Repeatedly have had questions about finding "ion channels"
 
** Searching gene class with text pulls out lots of false positives
 
** Could perform an analysis on particular classes of genes (e.g. histones or ion channels) and generate a micropublication providing the curated list
 
** Can generate a WormMine template query to pull these out for each release
 
** What classes of genes would we want to identify: histones, transcription factors, ion channels, protein kinases
 
** Chris will look into WormMine templates using gene class info and look into pulling in protein motif information
 
** We will ask other MODs and UniProt about how they deal with this issue
 
  
 +
=== Strains ===
 +
* Need to wait for new strain IDs from Hinxton before running dumping scripts
 +
* Don't edit multi-ontology strain fields in OA for now!
 +
* Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
 +
* "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now
  
== September 27, 2018 ==
+
=== Alliance literature curation ===
=== Update on the new AFP form and pipeline ===
+
* Working group will be formed soon
*Daniela, KImberly, Juancarlos, and Valerio will update on the current status of the new AFP form and pipeline
+
* Will work out general common pipelines for literature curation
**Overall, the goal has been to incorporate as much Textpresso-based entity and data-type flagging as possible into the form
 
**Move from author data flagging to author data validation wherever we can
 
**Provide opportunities for authors to submit more detailed curation if they want
 
* General: Positive thru SVM gets checked checkbox
 
* General: Question mark icons with help text
 
* Gene recognition
 
** Need to set a threshold of mentions; don't necessarily want all genes mentioned once
 
*** Can we show all genes, ranked by occurrence?
 
** Don't want to overwhelm users
 
** How are the genes identified? Via the Textpresso pipeline, string matching, consolidate multiple instances (protein, gene, etc.) into single gene result
 
** Searches include supplemental materials
 
** Cannot search by section of paper
 
** Can we identify genes other than C. elegans/worm? Are not doing now, and will stick to C. elegans for now
 
** Will expand to non-elegans nematodes in future; will expand to other species when extending to other MODs/Alliance members
 
** Chris: should we show the name of the gene as mentioned, verbatim, from the paper?
 
*** Karen: No, we should insist authors use the proper names
 
*** Chris: Meant referencing sequence names in paper, but public name comes out by the time AFP goes to authors, causing confusion
 
** Can we pull genes from tables? We are pulling from PDF tables, but not supplemental Excel tables, for example
 
* Gene model updates: checkbox yes/no
 
* Species in paper
 
** Including worm, mouse, human, yeast
 
** Still more work to do on this front
 
* Alleles recognized
 
** Show list of allele names and WBVar IDs for confirmation
 
** Can submit new alleles within the AFP form (just allele names, no genes or other info; keeping it simple)
 
* Allele sequence change checkbox yes/no (link to Allele sequence info form)
 
* Can there be a feedback option readily available? There is a comments section toward the end of the form under "Anything else?"
 
* Transgenes handled like alleles
 
* Antibodies
 
** Newly generated antibodies checkbox and text field (ask for details? consistency with alleles?) maybe shouldn't ask for antibody details; can make details optional
 
** Form for existing antibodies
 
* Expression data
 
** Anatomic expression in WT
 
** Site of action (may be difficult to interpret user input; ask for example; make text details required)
 
** Time of action
 
** RNAseq data
 
* Microarrays - just link out to GEO
 
* Interactions (all SVM based, three checkboxes)
 
* Phenotypes (SVMs, link to phenotype form)
 
* Disease
 
** Checkbox for worm orthologs of human disease gene, etc.
 
* Comments section (to point out missing data types, provide general comments on form)
 
** Ask for unpublished data and suggest micropublication
 
* Final thank you and update contact info and lineage
 
* CIT feedback
 
** Maybe make font size larger
 
** Mobile device compatible? Yes
 
** Change "Anything else?" to "Anything else? Comments?"
 
** Can people save and return later? Yes
 
*** How do we know they're finished? There is a "Finish and submit" button at end (but authors can still go back and make changes later)
 
*** Maybe move "Finish and submit" button to left panel so it is always visible? Maybe make the button stand alone?
 
** If authors indicate there are physical interactions, can we distinguish elegans-elegans interactions vs. non-elegans or interspecies interactions? No, we cannot yet distinguish
 
  
=== Genetics and G3 papers in Textpresso ===
+
=== SObA Graph relations ===
* These papers don't get a PMID yet (when they first enter WB), only DOI (most of time DOI doesn't work (yet))
+
* Currently only integrating over "is a", "part of" and "regulates"
* DOI should work right away; Karen will look into if there's a problem/typo
+
* Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"
* Daniel needs to keep track, go back and merge WBPapers once PMID goes live
 
* Kimberly or Karen may have to send papers directly to Daniel for uploading
 
* Should Daniel only download papers with a PubMed ID? Yes, except for micropublications?
 
* Need a separate pipeline for micropublications?
 
  
=== ParaSite (non-elegans) papers ===
+
=== Author First Pass ===
* Should Daniel be trying to download all of these papers? Many are hard to track down
+
* Putting together paper for AFP
* Daniel should ask Michael Paulini
+
* Reviewing all user input for paper
 +
* Asking individual curators to check input

Revision as of 16:39, 19 September 2019

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings


GoToMeeting link: https://www.gotomeet.me/wormbase1


2019 Meetings

January

February

March

April

May

June

July

August


September 12, 2019

Update on SVM pipeline

  • New SVM pipeline: more analysis and more parameter tuning
  • avoiding precision (and F-value) as a measure (dependent on ratio of positives and negatives in test set)
  • For example shown, "dumb" machine starts out with precision above 0.6
  • G-value (Michael's invention); does not depend on distribution of sets
  • Applied to various data types
  • Analysis: 10-fold cross validation
    • Randomly select 10% pos and neg (without replacement) and repeat until all papers sampled
  • F-value changes over different p/n values; G-value does not (essentially flat)
  • Area Under the Curve (AUC): probability that a random positive scores higher than random negative
  • AUC values for many WB data types upper 80%'s into 90%'s
  • Ranjana: How many papers for a good training set? Michael: we don't know yet
  • Can't reproduce old training sets (for old SVM); provide Michael better training sets if you want improved SVM
  • If SVM still not good enough, Michael will work on deep neural networks (Tensor Flow)
  • Michael can provide training sets he has used recently

Clarifying definitions of "defective" and "deficient" for phenotypes

  • WB phenotype ontology has many "variant/abnormal" terms and distinct subclass terms for "defective/deficient"
  • Have tried to create a logical definition pattern for these terms, but the vagueness of the meaning of "defective" and how it is distinct from "abnormal" has stalled the process
  • What do we mean exactly by "defective" and how, specifically, is this distinct from "abnormal"?
  • Definitions include meanings or words:
    • "Variations in the ability"
    • "aberrant"
    • "defect"
    • "defective"
    • "defects"
    • "deficiency"
    • "deficient"
    • "disrupted"
    • "impaired"
    • "incompetent"
    • "ineffective"
    • "perturbation that disrupts"
    • Failure to execute the characteristic response = abnormal?
    • abnormal
    • abnormality leading to specific outcomes
    • fail to exhibit the same taxis behavior = abnormal?
    • failure
    • failure OR delayed
    • failure, slower OR late
    • failure/abnormal
    • reduced
    • slower

Citace upload

    • Tuesday, Sep 24th

Strain to ID mapping

  • Waiting on Hinxton to send strain ID mapping file?
  • Hopefully we can all get that well before the upload deadline
  • Will do global replacement at time of citace upload (at least for now)

New name server

  • When will this officially go live?
  • Will we now be able to request strain IDs through the server? Yes

SObA Graphs

  • New graphs now live on site (Expression, Gene Ontology, Human Diseases, Phenotypes)
  • A lot of whitespace padding above and below graph; maybe trim? trimming vertically would ultimately limit the view pane when user wants to zoom in, so we should leave as is for now
  • Diff tool: Raymond and Juancarlos created a prototype diff tool (for comparing two genes, for example)
    • Paul: compared two genes that should be very similar, but there are a lot of differences; may reflect annotation coverage rather than biology


September 19, 2019

Strains

  • Need to wait for new strain IDs from Hinxton before running dumping scripts
  • Don't edit multi-ontology strain fields in OA for now!
  • Juancarlos will map free text and ontology-name strain entries to strain IDs once we have the complete mapping file
  • "Requested strain" field in Disease OA; not dumped, so don't need to worry about right now

Alliance literature curation

  • Working group will be formed soon
  • Will work out general common pipelines for literature curation

SObA Graph relations

  • Currently only integrating over "is a", "part of" and "regulates"
  • Maybe we could provide users an option to specify which relations to include, or maybe just exclude "regulates"

Author First Pass

  • Putting together paper for AFP
  • Reviewing all user input for paper
  • Asking individual curators to check input