Revision as of 19:47, 22 July 2013

2013 Meetings

July 11, 2013

Geneace daily dump

EBI is moving nameserver location
Getting real-time updates of gene list for genes in OA
Michael Paulini set up nightly geneace dumps to FTP site
We have gene file from nameserver: cgc name, public names, sequence name, live/dead status, gene IDs
What data do we want additionally? Synonyms?

Spica has officially been moved to new machine

Let Raymond know of any problems
Would be good to track all accounts on Spica (and any other machine)
- Can use log of all user logins

AMIGO 2 still moving forward

Process Pages/WikiPathways

iFrame window doesn't work/load on Firefox; they are working on it
iFrame window interactive display somewhat problematic
Discussing Cytoscape as alternative?
Using Cytoscape to display pathways would require significant development
Some app available to load GPML from WikiPathways into Cytoscape, but JD couldn't get it working (yet)
Having all process-related interactions in an Interactions widget on Process Pages
- Users need a clearer legend explaining what the different edges mean
- We need to modify some edges (e.g. flat ends do not mean repression; maybe they should)

Author First Pass Forms

Currently we collect data from authors that we may not have intention of curating (at least right away)
We can provide a disclaimer on the letter to authors explaining that some data may not be curated immediately
All data is catalogued

Sequence Feature curation

Xiaodong met with Gary Williams and Mary Ann Tuli at IWM
Enhancer curation?
Significant backlog on sequence feature curation
Margie Ho asking about curated enhancers, regulatory regions
Margie has 30 papers with highly annotated regulatory regions
Gary W. is prioritizing curation of these now
Gary will propose appropriate model changes (e.g. Add "silencer" and "enhancer" to method for GBrowse display)

User use case: All G-protein coupled receptors expressed in AWC neurons

Curation strategies

Change our paper-by-paper curation
We may be able to make use of a Textpresso categorization program to tag papers
Caltech curators can then prioritize their curation based on a particular category or topic
We can look at the Textpresso paper and reconvene next week to discuss

Textpresso Paper categorization

Prioritization of papers based on: 1) SVM-Textpresso script categorization, and 2) Ideal prioritization scheme according to curation status
How does this tie into our grant quarterly progress report?
Can we create a putative milestone to achieve for the WS240 upload?
How do we consider backlog size wrt priorities and categorization?
Even if a data-type backlog is small, it would be worth going back to older curation to check for accuracy and consistency
Will this pipeline be more efficient? We should define metrics to measure curation effectiveness/efficiency
- Compare curation statistics of new pipeline to last year or two of curation statistics
Yuling can run existing SVM pipeline on corpus (supervised learning); unsupervised learning will require more human effort
We can provide lists of keywords to improve the categorization
There are 1750 papers with author first pass responses, Juancarlos emailed the paperIDs with timestamp of response

Upload

ACEDB, Citace Minus

We will remove write-access from citace, moving personal files to citpub for write-access
Wen will send out a summary e-mail
Raymond can/will create individual user accounts (for those who want it) with access to personal versions of CitaceMinus and WS
- Personal versions of CitaceMinus and WS will be write-accessible
- Write e-mail to Raymond to request an account on Spica

Nightly GeneAce dumps

What data from nameserver do we want to pull nightly?
We need schema and existing data from Michael Paulini
Until curators (Kimberly ?) tell Juancarlos what we want to extract, we're keeping the scripts that get data from the nameserver.

@@ Line 119: / Line 119: @@
 *What data from nameserver do we want to pull nightly?
 *We need schema and existing data from Michael Paulini
+*Until curators (Kimberly ?) tell Juancarlos what we want to extract, we're keeping the scripts that get data from the nameserver.