WormBase-Caltech Weekly Calls September 2018

From WormBaseWiki
Jump to navigationJump to search

September 6, 2018

Genotype class

Citace Upload

  • Send citace files to Wen by Sept 18, 10am Pacific

Automated gene descriptions

  • Group making improvements
  • Added disease, protein domains
    • When direct experimental evidence for disease relevance, will say "gene has been used to study"
  • When minimal data (information-poor genes), we can refer to human ortholog and data stored for that human gene in Alliance
  • Continue to receive feedback from users; include enrichment information, etc.
  • Now have good trimming algorithms to retain important info without flooding a description with too many granular terms
  • Will not store automated descriptions in Postgres
  • Wen will modify SimpleMine scripts to accommodate change
  • Would be good to write a paper on automated concise descriptions
  • Came up at GO meeting/hackathon: Translating GO-CAM models into concise descriptions?
    • Should be doable; just need to develop code when we're ready to do that

Textpresso presentation at next Alliance all-hands call

  • Who should present? Maybe have several people? Kimberly, Valerio, Michael for sections?
  • We can discuss at next Textpresso meeting
  • Should cover techniques and software, but keep it generally simple and comprehensible for a larger audience


September 13, 2018

Citace upload

  • Send files to Wen by 10am Tuesday (18th)

Genotype class

  • ?Genotype class proposal
  • Genotype_name tag: free text summary of the genotype
    • Can this be automatically generated from components? Ideally, yes, but may be difficult
    • Otherwise, can be manually written, as we have been doing, but is a bit denormalized and may require maintenance
  • Genotype_description tag: free-text description of the genotype
    • No precedent and probably not going to start now, so will remove from model
  • Genotype_components supertag: to collect genotype component objects and, where necessary, free text
    • We want to be able to express zygosity for each referenced object, likely requires a #Zygosity hash (and hence a ?Zygosity model)
    • ?Zygosity model can have three main tags: "Homozygous", "Heterozygous_with_wild_type", or "Heteroallelic_combination_with"
      • Heteroallelic_combination_with tag could further specify the type and identity of the object that is in heteroallelic combination with the original object
      • Since it is not ideal to store an arbitrary component in the #Zygosity hash, we should probably just state the zygosity as "Heteroallelic_combination" and for display purposes have an automated way to calculate which components affect the same locus/loci (if necessary)


September 20, 2018

Kimberly's talk at Rutgers

  • Kimberly went to worm meeting at Rutgers (10 labs using C. elegans? 6-7 totally worm-centric)
  • 30-40 attendees (PIs, postdocs, grad students)
  • Discussed tools and features at WB
  • Presented Alliance pages and Textpresso
  • PIs are enthusiastic about WB
  • Monica Driscoll made a good plug for Textpresso
  • People requesting FAQs and user guides (text and videos)
  • Monica suggested a WB tutorial for PIs ;p
  • Some people surprised about what they can accomplish using the tools available, like SimpleMine
  • Covered gene set enrichment, WormMine, SPELL, SimpleMine, ParaSite BioMart, Textpresso
  • Would be good to show people how to use Textpresso Central
  • Discussed micropublications, asked about negative results (precedent?)
  • Some asked about WB funding and Alliance plans
  • Can we make a within-page search available to find, for example, field names etc.
  • Some challenges in find genes/proteins of certain class
    • Had question about histone genes recently
    • Repeatedly have had questions about finding "ion channels"
    • Searching gene class with text pulls out lots of false positives
    • Could perform an analysis on particular classes of genes (e.g. histones or ion channels) and generate a micropublication providing the curated list
    • Can generate a WormMine template query to pull these out for each release
    • What classes of genes would we want to identify: histones, transcription factors, ion channels, protein kinases
    • Chris will look into WormMine templates using gene class info and look into pulling in protein motif information
    • We will ask other MODs and UniProt about how they deal with this issue


September 27, 2018

Update on the new AFP form and pipeline

  • Daniela, KImberly, Juancarlos, and Valerio will update on the current status of the new AFP form and pipeline
    • Overall, the goal has been to incorporate as much Textpresso-based entity and data-type flagging as possible into the form
    • Move from author data flagging to author data validation wherever we can
    • Provide opportunities for authors to submit more detailed curation if they want
  • General: Positive thru SVM gets checked checkbox
  • General: Question mark icons with help text
  • Gene recognition
    • Need to set a threshold of mentions; don't necessarily want all genes mentioned once
      • Can we show all genes, ranked by occurrence?
    • Don't want to overwhelm users
    • How are the genes identified? Via the Textpresso pipeline, string matching, consolidate multiple instances (protein, gene, etc.) into single gene result
    • Searches include supplemental materials
    • Cannot search by section of paper
    • Can we identify genes other than C. elegans/worm? Are not doing now, and will stick to C. elegans for now
    • Will expand to non-elegans nematodes in future; will expand to other species when extending to other MODs/Alliance members
    • Chris: should we show the name of the gene as mentioned, verbatim, from the paper?
      • Karen: No, we should insist authors use the proper names
      • Chris: Meant referencing sequence names in paper, but public name comes out by the time AFP goes to authors, causing confusion
    • Can we pull genes from tables? We are pulling from PDF tables, but not supplemental Excel tables, for example
  • Gene model updates: checkbox yes/no
  • Species in paper
    • Including worm, mouse, human, yeast
    • Still more work to do on this front
  • Alleles recognized
    • Show list of allele names and WBVar IDs for confirmation
    • Can submit new alleles within the AFP form (just allele names, no genes or other info; keeping it simple)
  • Allele sequence change checkbox yes/no (link to Allele sequence info form)
  • Can there be a feedback option readily available? There is a comments section toward the end of the form under "Anything else?"
  • Transgenes handled like alleles
  • Antibodies
    • Newly generated antibodies checkbox and text field (ask for details? consistency with alleles?) maybe shouldn't ask for antibody details; can make details optional
    • Form for existing antibodies
  • Expression data
    • Anatomic expression in WT
    • Site of action (may be difficult to interpret user input; ask for example; make text details required)
    • Time of action
    • RNAseq data
  • Microarrays - just link out to GEO
  • Interactions (all SVM based, three checkboxes)
  • Phenotypes (SVMs, link to phenotype form)
  • Disease
    • Checkbox for worm orthologs of human disease gene, etc.
  • Comments section (to point out missing data types, provide general comments on form)
    • Ask for unpublished data and suggest micropublication
  • Final thank you and update contact info and lineage
  • CIT feedback
    • Maybe make font size larger
    • Mobile device compatible? Yes
    • Change "Anything else?" to "Anything else? Comments?"
    • Can people save and return later? Yes
      • How do we know they're finished? There is a "Finish and submit" button at end (but authors can still go back and make changes later)
      • Maybe move "Finish and submit" button to left panel so it is always visible? Maybe make the button stand alone?
    • If authors indicate there are physical interactions, can we distinguish elegans-elegans interactions vs. non-elegans or interspecies interactions? No, we cannot yet distinguish

Genetics and G3 papers in Textpresso

  • These papers don't get a PMID yet (when they first enter WB), only DOI (most of time DOI doesn't work (yet))
  • DOI should work right away; Karen will look into if there's a problem/typo
  • Daniel needs to keep track, go back and merge WBPapers once PMID goes live
  • Kimberly or Karen may have to send papers directly to Daniel for uploading
  • Should Daniel only download papers with a PubMed ID? Yes, except for micropublications?
  • Need a separate pipeline for micropublications? Daniel is currently downloading the papers

ParaSite (non-elegans) papers

  • Should Daniel be trying to download all of these papers? Many are hard to track down
  • Daniel should ask Michael Paulini