WormBase-Caltech Weekly Calls

From WormBaseWiki
Jump to navigationJump to search

2009 Meetings

2011 Meetings

2012 Meetings


2013 Meetings


January

February

March

April

May

June


July 11, 2013

Geneace daily dump

  • EBI is moving nameserver location
  • Getting real-time updates of gene list for genes in OA
  • Michael Paulini set up nightly geneace dumps to FTP site
  • We have gene file from nameserver: cgc name, public names, sequence name, live/dead status, gene IDs
  • What data do we want additionally? Synonyms?


Spica has officially been moved to new machine

  • Let Raymond know of any problems
  • Would be good to track all accounts on Spica (and any other machine)
    • Can use log of all user logins


AMIGO 2 still moving forward

  • AMIGO 2 might go live July 17th
  • We should be able to start configuring for WormBase at that point


Process Pages/WikiPathways

  • iFrame window doesn't work/load on Firefox; they are working on it
  • iFrame window interactive display somewhat problematic
  • Discussing Cytoscape as alternative?
  • Using Cytoscape to display pathways would require significant development
  • Some app available to load GPML from WikiPathways into Cytoscape, but JD couldn't get it working (yet)
  • Having all process-related interactions in an Interactions widget on Process Pages
    • Users need a clearer legend explaining what the different edges mean
    • We need to modify some edges (e.g. flat ends do not mean repression; maybe they should)


Author First Pass Forms

  • Currently we collect data from authors that we may not have intention of curating (at least right away)
  • We can provide a disclaimer on the letter to authors explaining that some data may not be curated immediately
  • All data is catalogued


Sequence Feature curation

  • Xiaodong met with Gary Williams and Mary Ann Tuli at IWM
  • Enhancer curation?
  • Significant backlog on sequence feature curation
  • Margie Ho asking about curated enhancers, regulatory regions
  • Margie has 30 papers with highly annotated regulatory regions
  • Gary W. is prioritizing curation of these now
  • Gary will propose appropriate model changes (e.g. Add "silencer" and "enhancer" to method for GBrowse display)


User use case: All G-protein coupled receptors expressed in AWC neurons

  • Quantitative expression data is not effectively linked to anatomy terms
  • Wen will propose model fix to accommodate this association
  • Genes expressed above some pre-defined threshold will be associated with a cell
  • For example, Ping (in Paul's lab) will be performing AWC single cell profling
  • AWA and BAG neurons has been profiled by tiling arrays
  • Male linker cell by RNA-Seq (Erich and Mihoko)


Curation strategies

  • Change our paper-by-paper curation
  • We may be able to make use of a Textpresso categorization program to tag papers
  • Caltech curators can then prioritize their curation based on a particular category or topic
  • We can look at the Textpresso paper and reconvene next week to discuss


July 18, 2013

Textpresso Paper categorization

  • Prioritization of papers based on: 1) SVM-Textpresso script categorization, and 2) Ideal prioritization scheme according to curation status
  • How does this tie into our grant quarterly progress report?
  • Can we create a putative milestone to achieve for the WS240 upload?
  • How do we consider backlog size wrt priorities and categorization?
  • Even if a data-type backlog is small, it would be worth going back to older curation to check for accuracy and consistency
  • Will this pipeline be more efficient? We should define metrics to measure curation effectiveness/efficiency
    • Compare curation statistics of new pipeline to last year or two of curation statistics
  • Yuling can run existing SVM pipeline on corpus (supervised learning); unsupervised learning will require more human effort
  • We can provide lists of keywords to improve the categorization
  • There are 1750 papers with author first pass responses, Juancarlos emailed the paperIDs with timestamp of response


Upload

  • Next upload (WS240) deadline will be last Friday in September (Sept 27th, 2013)


ACEDB, Citace Minus

  • We will remove write-access from citace, moving personal files to citpub for write-access
  • Wen will send out a summary e-mail
  • Raymond can/will create individual user accounts (for those who want it) with access to personal versions of CitaceMinus and WS
    • Personal versions of CitaceMinus and WS will be write-accessible
    • Write e-mail to Raymond to request an account on Spica


Nightly GeneAce dumps

  • What data from nameserver do we want to pull nightly?
  • We need schema and existing data from Michael Paulini
  • Until curators (Kimberly ?) tell Juancarlos what we want to extract, we're keeping the scripts that get data from the nameserver.


July 25, 2013

Ontology Browsers

  • We are currently testing and developing AMIGO2 for integration into WormBase
  • Example Browsers:
  • We are trying to decide on what would be an optimal ontology browsing experience
  • Browser experience features to consider:
    • Directed Acyclic Graph (DAG) view
      • Good for visualizing "Path to Root" relationships of an ontology term
    • Viewing children, parents, and/or siblings
      • Interactive expanding/collapsing of terms
      • Static tree or table views
      • "Inferred Tree View" - a compressed path to root tree view (in text format)
    • Clicking to open a new web page versus interactive browsing without reloads
  • Consensus is that interactive expandable and collapsible nodes would be ideal
  • We will use the Gene Ontology as a pilot ontology to first introduce/integrate into the WormBase site; other ontologies can come later
  • We will provide link outs to AMIGO and related sites/services to take advantage of their data and tools


Sequence features related to expression patterns and gene regulation interactions

  • Add a "Transcription factor binding" tag to Expr_pattern model? Not necessary?
  • Already captured in gene regulation interactions?
  • We need to discuss (site-wide) what model changes (if any) would be required to adequately capture this information
  • Gary Williams working on curating sequence features to link appropriately to Expr_pattern, Interaction (regulatory), and (maybe) Transgenes


Author First Pass paper word frequency analysis

  • Yuling performed word frequency analysis of whole papers and now sections
  • Karen took the "Titles" analysis, filtered out words with less than 10 hits, highlighted potential keywords
  • How should we go about choosing a topic?
  • We will choose AFP papers with "stress" in the title and assess curation status of each paper for our individual data types