Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
Line 119: Line 119:
*What data from nameserver do we want to pull nightly?
*What data from nameserver do we want to pull nightly?
*We need schema and existing data from Michael Paulini
*We need schema and existing data from Michael Paulini
*Until curators (Kimberly ?) tell Juancarlos what we want to extract, we're keeping the scripts that get data from the nameserver.

Revision as of 19:47, 22 July 2013

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings







July 11, 2013

Geneace daily dump

  • EBI is moving nameserver location
  • Getting real-time updates of gene list for genes in OA
  • Michael Paulini set up nightly geneace dumps to FTP site
  • We have gene file from nameserver: cgc name, public names, sequence name, live/dead status, gene IDs
  • What data do we want additionally? Synonyms?

Spica has officially been moved to new machine

  • Let Raymond know of any problems
  • Would be good to track all accounts on Spica (and any other machine)
    • Can use log of all user logins

AMIGO 2 still moving forward

  • AMIGO 2 might go live July 17th
  • We should be able to start configuring for WormBase at that point

Process Pages/WikiPathways

  • iFrame window doesn't work/load on Firefox; they are working on it
  • iFrame window interactive display somewhat problematic
  • Discussing Cytoscape as alternative?
  • Using Cytoscape to display pathways would require significant development
  • Some app available to load GPML from WikiPathways into Cytoscape, but JD couldn't get it working (yet)
  • Having all process-related interactions in an Interactions widget on Process Pages
    • Users need a clearer legend explaining what the different edges mean
    • We need to modify some edges (e.g. flat ends do not mean repression; maybe they should)

Author First Pass Forms

  • Currently we collect data from authors that we may not have intention of curating (at least right away)
  • We can provide a disclaimer on the letter to authors explaining that some data may not be curated immediately
  • All data is catalogued

Sequence Feature curation

  • Xiaodong met with Gary Williams and Mary Ann Tuli at IWM
  • Enhancer curation?
  • Significant backlog on sequence feature curation
  • Margie Ho asking about curated enhancers, regulatory regions
  • Margie has 30 papers with highly annotated regulatory regions
  • Gary W. is prioritizing curation of these now
  • Gary will propose appropriate model changes (e.g. Add "silencer" and "enhancer" to method for GBrowse display)

User use case: All G-protein coupled receptors expressed in AWC neurons

  • Quantitative expression data is not effectively linked to anatomy terms
  • Wen will propose model fix to accommodate this association
  • Genes expressed above some pre-defined threshold will be associated with a cell
  • For example, Ping (in Paul's lab) will be performing AWC single cell profling
  • AWA and BAG neurons has been profiled by tiling arrays
  • Male linker cell by RNA-Seq (Erich and Mihoko)

Curation strategies

  • Change our paper-by-paper curation
  • We may be able to make use of a Textpresso categorization program to tag papers
  • Caltech curators can then prioritize their curation based on a particular category or topic
  • We can look at the Textpresso paper and reconvene next week to discuss

July 18, 2013

Textpresso Paper categorization

  • Prioritization of papers based on: 1) SVM-Textpresso script categorization, and 2) Ideal prioritization scheme according to curation status
  • How does this tie into our grant quarterly progress report?
  • Can we create a putative milestone to achieve for the WS240 upload?
  • How do we consider backlog size wrt priorities and categorization?
  • Even if a data-type backlog is small, it would be worth going back to older curation to check for accuracy and consistency
  • Will this pipeline be more efficient? We should define metrics to measure curation effectiveness/efficiency
    • Compare curation statistics of new pipeline to last year or two of curation statistics
  • Yuling can run existing SVM pipeline on corpus (supervised learning); unsupervised learning will require more human effort
  • We can provide lists of keywords to improve the categorization
  • There are 1750 papers with author first pass responses, Juancarlos emailed the paperIDs with timestamp of response


  • Next upload (WS240) deadline will be last Friday in September (Sept 27th, 2013)

ACEDB, Citace Minus

  • We will remove write-access from citace, moving personal files to citpub for write-access
  • Wen will send out a summary e-mail
  • Raymond can/will create individual user accounts (for those who want it) with access to personal versions of CitaceMinus and WS
    • Personal versions of CitaceMinus and WS will be write-accessible
    • Write e-mail to Raymond to request an account on Spica

Nightly GeneAce dumps

  • What data from nameserver do we want to pull nightly?
  • We need schema and existing data from Michael Paulini
  • Until curators (Kimberly ?) tell Juancarlos what we want to extract, we're keeping the scripts that get data from the nameserver.