WormBase-Caltech Weekly Calls
From WormBaseWiki
2013 Meetings
July 11, 2013
Geneace daily dump
- EBI is moving nameserver location
- Getting real-time updates of gene list for genes in OA
- Michael Paulini set up nightly geneace dumps to FTP site
- We have gene file from nameserver: cgc name, public names, sequence name, live/dead status, gene IDs
- What data do we want additionally? Synonyms?
Spica has officially been moved to new machine
- Let Raymond know of any problems
- Would be good to track all accounts on Spica (and any other machine)
- Can use log of all user logins
AMIGO 2 still moving forward
- AMIGO 2 might go live July 17th
- We should be able to start configuring for WormBase at that point
Process Pages/WikiPathways
- iFrame window doesn't work/load on Firefox; they are working on it
- iFrame window interactive display somewhat problematic
- Discussing Cytoscape as alternative?
- Using Cytoscape to display pathways would require significant development
- Some app available to load GPML from WikiPathways into Cytoscape, but JD couldn't get it working (yet)
- Having all process-related interactions in an Interactions widget on Process Pages
- Users need a clearer legend explaining what the different edges mean
- We need to modify some edges (e.g. flat ends do not mean repression; maybe they should)
Author First Pass Forms
- Currently we collect data from authors that we may not have intention of curating (at least right away)
- We can provide a disclaimer on the letter to authors explaining that some data may not be curated immediately
- All data is catalogued
Sequence Feature curation
- Xiaodong met with Gary Williams and Mary Ann Tuli at IWM
- Enhancer curation?
- Significant backlog on sequence feature curation
- Margie Ho asking about curated enhancers, regulatory regions
- Margie has 30 papers with highly annotated regulatory regions
- Gary W. is prioritizing curation of these now
- Gary will propose appropriate model changes (e.g. Add "silencer" and "enhancer" to method for GBrowse display)
User use case: All G-protein coupled receptors expressed in AWC neurons
- Quantitative expression data is not effectively linked to anatomy terms
- Wen will propose model fix to accommodate this association
- Genes expressed above some pre-defined threshold will be associated with a cell
- For example, Ping (in Paul's lab) will be performing AWC single cell profling
- AWA and BAG neurons has been profiled by tiling arrays
- Male linker cell by RNA-Seq (Erich and Mihoko)
Curation strategies
- Change our paper-by-paper curation
- We may be able to make use of a Textpresso categorization program to tag papers
- Caltech curators can then prioritize their curation based on a particular category or topic
- We can look at the Textpresso paper and reconvene next week to discuss
July 18, 2013
Textpresso Paper categorization
- Prioritization of papers based on: 1) SVM-Textpresso script categorization, and 2) Ideal prioritization scheme according to curation status
- How does this tie into our grant quarterly progress report?
- Can we create a putative milestone to achieve for the WS240 upload?
- How do we consider backlog size wrt priorities and categorization?
- Even if a data-type backlog is small, it would be worth going back to older curation to check for accuracy and consistency
- Will this pipeline be more efficient? We should define metrics to measure curation effectiveness/efficiency
- Compare curation statistics of new pipeline to last year or two of curation statistics
- Yuling can run existing SVM pipeline on corpus (supervised learning); unsupervised learning will require more human effort
- We can provide lists of keywords to improve the categorization
- There are 1750 papers with author first pass responses, Juancarlos emailed the paperIDs with timestamp of response
Upload
- Next upload (WS240) deadline will be last Friday in September (Sept 27th, 2013)
ACEDB, Citace Minus
- We will remove write-access from citace, moving personal files to citpub for write-access
- Wen will send out a summary e-mail
- Raymond can/will create individual user accounts (for those who want it) with access to personal versions of CitaceMinus and WS
- Personal versions of CitaceMinus and WS will be write-accessible
- Write e-mail to Raymond to request an account on Spica
Nightly GeneAce dumps
- What data from nameserver do we want to pull nightly?
- We need schema and existing data from Michael Paulini
- Until curators (Kimberly ?) tell Juancarlos what we want to extract, we're keeping the scripts that get data from the nameserver.
July 25, 2013
Ontology Browsers
- We are currently testing and developing AMIGO2 for integration into WormBase
- most recent version at http://mangolassi.caltech.edu/~azurebrd/cgi-bin/testing/amigo/wobr/amigo.cgi
- Example Browsers:
- Ontology Lookup Service (OLS) from EBI
- MISO (Sequence Ontology Browser from www.sequenceontology.org)
- OBO-Edit
- Protege
- We are trying to decide on what would be an optimal ontology browsing experience
- Browser experience features to consider:
- Directed Acyclic Graph (DAG) view
- Good for visualizing "Path to Root" relationships of an ontology term
- Viewing children, parents, and/or siblings
- Interactive expanding/collapsing of terms
- Static tree or table views
- "Inferred Tree View" - a compressed path to root tree view (in text format)
- Clicking to open a new web page versus interactive browsing without reloads
- Directed Acyclic Graph (DAG) view
- Consensus is that interactive expandable and collapsible nodes would be ideal
- We will use the Gene Ontology as a pilot ontology to first introduce/integrate into the WormBase site; other ontologies can come later
- We will provide link outs to AMIGO and related sites/services to take advantage of their data and tools
Sequence features related to expression patterns and gene regulation interactions
- Add a "Transcription factor binding" tag to Expr_pattern model? Not necessary?
- Already captured in gene regulation interactions?
- We need to discuss (site-wide) what model changes (if any) would be required to adequately capture this information
- Gary Williams working on curating sequence features to link appropriately to Expr_pattern, Interaction (regulatory), and (maybe) Transgenes
Author First Pass paper word frequency analysis
- Yuling performed word frequency analysis of whole papers and now sections
- Karen took the "Titles" analysis, filtered out words with less than 10 hits, highlighted potential keywords
- How should we go about choosing a topic?
- We will choose AFP papers with "stress" in the title and assess curation status of each paper for our individual data types
- WBPaper00031692
- WBPaper00031694
- WBPaper00031842
- WBPaper00031873
- WBPaper00032236
- WBPaper00032241
- WBPaper00033114
- WBPaper00032321
- WBPaper00034757
- WBPaper00035114
- WBPaper00036083
- WBPaper00036413
- WBPaper00036090
- WBPaper00036135
- WBPaper00035965
- WBPaper00037147
- WBPaper00037595
- WBPaper00037886
- WBPaper00038233
- WBPaper00039783
- WBPaper00039990
- WBPaper00038093
- WBPaper00040006
- WBPaper00039878
- WBPaper00039835
- WBPaper00040166
- WBPaper00039788
- WBPaper00040384
- WBPaper00038464
- WBPaper00040697
- WBPaper00040849
- WBPaper00041075
- WBPaper00040133
- WBPaper00040902
- WBPaper00041277
- WBPaper00041295
- WBPaper00041568
- WBPaper00041528
- WBPaper00041150
- WBPaper00041610
- WBPaper00041663
- WBPaper00041866
- WBPaper00042148
- WBPaper00042067
- WBPaper00042178