WormBase-Caltech Weekly Calls June 2018

From WormBaseWiki
Revision as of 14:10, 5 July 2018 by Cgrove (talk | contribs) (Created page with "== June 7, 2018 == === Alliance literature pipeline working group === * Has anyone gotten back to Carol Bult about WB/Textpresso membership in the group? Not yet * Paul & Kim...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

June 7, 2018

Alliance literature pipeline working group

  • Has anyone gotten back to Carol Bult about WB/Textpresso membership in the group? Not yet
  • Paul & Kimberly can join; Kimberly will sit in on group meetings

SPELL migration

  • SGD put SPELL into the cloud; will give us code to set up WB SPELL in cloud
  • Want to have common interface but two instances (yeast & worm)
  • Worm data in SPELL may be more complicated than the yeast data (different species, platforms, meta data, etc.)
  • Mike Cherry proposed to use GTEx to replace SPELL. (www.gtexportal.org)
  • Trying to get GTEX to have same functionality as SPELL

Progress report

  • Wen can generate numbers of changes since last year
  • 5-year progress report coming up (for funding agencies)
  • Want a progress report-like document to give to SAB in July
  • WS259 - WS265

Paper author name

Lab class

  • Lab data is not clean/consistent; has gotten messy
  • Cecilia trying to clean it up
  • There is conflicting data from lab class, author class, and person class
  • Currently there is a ~redundant curation pipeline; considering pulling person info into lab class
  • When looking at lab page, if there is a problem with a person's info, changes would be made (requested) in the person class, not the lab class
  • Cecilia will contact labs to ask about correctness of information
  • Can have a discussion with Ann and Aric at CGC

How many C. elegans genes have "good" knockouts?

  • Paul giving talk next week, wants to report
  • Chris will look into
  • Mitani did 1,000 deletions in last year

Server logs

  • SimpleMine logs are all coming from Amazon; will need to ask Todd about SimpleMine usage (from AWS stats?)

Migration of lab data

  • Data for labs was coming from Name Server (in Hinxton) - Comment:[PD] Not coming from the Nameserver, geneace was the only source of Lab data prior to handover.
  • Now Paul D has stopped curating lab info - Comment:[PD] Build config has been updated to take the majority of lab data from citace rather than geneace
Database        File                                    Class                   remove/format 
db=geneace      file=geneace_Laboratory.ace             class=Laboratory        format="Alleles WBVar\d{8}"     delete=Representative   delete=Registered_lab_members   delete=Past_lab_members delete=Allele_designation       delete=Strain_designation       delete=Address
db=citace       file=caltech_Laboratory.ace             class=Laboratory        delete=Alleles  format="Representative WBPerson\d{1,5}" format="Registered_lab_members WBPerson\d{1,5}" format="Past_lab_members WBPerson\d{1,5}"
  • Now that Cecilia is curating, will pull lab info from Postgres

June 21, 2018


  • Lightning talks at project meeting; everyone should consider what they want to present (5 minutes with 5 minutes for questions)
  • 45 minutes curation talk to SAB; Chris volunteered; anyone else interested?
  • What data types are being incorporated with the Alliance? What have we gained from those conversations?
  • How have Alliance interactions benefited us?
  • How does the Datomic migration affect our curation and data models?
  • Alliance data models should be union of existing MOD data models, but does not require curation of all attributes at each MOD
  • Do WB and our users benefit from creating a ?Genotype class?

Phenotype & Disease Face-to-Face

  • Reviewed phenotype curation practices at each MOD
  • Discussed strain and genotype classes
  • Generally it's felt that genotypes should only represent full genotypes of actual individuals or strains
  • For phenotype annotations WB will still need a mechanism to attribute the specific genotypic components that are responsible for the observed phenotype along with a way to capture the complete or background genotype for context
  • MGI considers strains to be part of a genotype (i.e. background inbred strain into which alleles are introduced)
  • WB and SGD consider genotype to be part of strain
  • WB will still consider moving ahead with instantiating a ?Genotype class for capturing genotypes (e.g. for disease models) that don't have an explicitly reported strain name; also for capturing transient genotypes like heterozygosity as well as paternal and maternal contributions/genotypes

June 28, 2018

Bulk emailing community for phenotypes

  • Chris & Juancarlos have now setup a pipeline to send requests in bulk
  • Can send 75 emails at a time; GMail may limit to 500 per day (was hoping for 3,000 at a time)
  • Sent out 338 emails in past week; already received annotations for 18 papers, ~50 annotations
  • Papers go back to 1987, earliest paper received curation from in last week 2008
  • Have sent about 25 papers from 2018, got annotations for 5 of them
  • Ask SAB about email frequency
  • Would be good to link to a Textpresso search
  • Maybe make a link people can click on to indicate there is no phenotype data
  • Can a submitter chat with a curator? Maybe integrate Olark chat? Would be good to make available, indicate how to chat


  • Chris & Paul will present (curation & intro respectively)
  • Tools? Raymond can bring up enrichment analysis
  • Project meeting
    • Everyone should sign up for a topic if they haven't already

Phenotype Ontology

  • Is the Phenotype ontology growing? Very slowly
  • Received several suggested phenotypes from community in last batch
  • Want to focus on logical definitions
  • Mammalian Phenotype ontology - precomposed terms with logical definitions
  • Raymond trying to use the SObA approach to display logically defined phenotype terms; needs to figure out how to effectively model it
  • Should get useful feedback at the ICBO meeting in August
    • Would be good to establish ahead of time what we want to get from the meeting
  • Want feedback on:
    • Pre-composed vs. post-composed
    • Granularity
    • Logical definitions
  • Looking up phenotypes: want to make more user friendly
    • We could map phenotype terms to processes to allow community to find relevant terms
    • Want to find "male hook" phenotypes; correct term is "male copulatory structure variant"; the term "hook" should find "male copulatory structure"
  • What is the best use of a user's time? curator's time?
  • How are ontology logical definitions being used now?
    • Kimberly: To calculate/reason the inferences based on ontology parentage
    • Logical definitions are not very visible on relevant pages (like Amigo); should we make it more visible to users?

Protein interactions

  • Now up to date; how should we publish this info?
  • Can make a blog post and/or micropublication and put in next NAR paper

Marie-Claire arriving next week

  • What's the plan?
  • Phenotype curation? GO curation?
  • Daniela wants help with Picture curation; could explain over Skype
    • Mega-sync system could help (Daniela will discuss with Valerio)
    • Images stored on canopus machine; can canopus be bypassed in sync process?
    • Mega has data limits (50GB+ requires a paid account)
    • Could setup a custom sync pipeline locally

GO vs. Phenotype

  • Lots of GO QC/review going on now; Marie-Claire could start there
  • Would it be easier to start out with GO or phenotype? Just different
  • Kimberly could walk her through next week
  • Paul will try to get Marie-Claire to Toronto for the SAB

Alliance U24 grant

  • ~30-pager due September (using familiar format)
  • Databases, knowledge-bases and tools will all be considered separately
  • Format changes after September
  • Scope could be focused on web portal and infrastructure (not curation, for example)