WormBase-Caltech Weekly Calls August 2013
August 1, 2013
Quarterly Progress Reports
- Capturing curation stats from the Curation Status form
- What data types do we want to capture curation stats for that we are not currently?
- We have frequent database dumps that can be read for stats
- We can capture the stats table statically on a regular basis (daily)
- form at http://tazendra.caltech.edu/~postgres/cgi-bin/curation_status.cgi
- cronjob to get data from "Curation Statistics Page" button at
- deposits files every day at 5am to :
RNA-Seq and Tiling Array data
- Data in SPELL
- Wen found a lot more non modENCODE data sets
- May use SVM for expression cluster data
- Gene IDs can be found from original paper or data set
- Up-to-date mapping to genes is not currently done
AMIGO2 (Wormbase Ontology Browser)
- Raymond and Juancarlos have taken AMIGO2 infrastructure to make an ontology browser for integration into WormBase
- GO Term focus page demo
- Graph view shows path to root (DAG view)
- Inferred tree view shows:
- Ancestor terms, no annotation numbers
- Main term and children, with annotation numbers (inferred, term and descendant annotations)
- Annotation numbers link to list of genes
- Will not show "direct" annotations, only inferred
- Sibling term displays: list parents with option to expand to see siblings of the main term
- Separate expandable/collapsible tree of ontology ("Browse entire ontology")
- Widget can be coded to integrate the ontology browser
- Word frequency
- We chose papers from the Author First Pass (AFP) list with 'stress'
- About 40 papers in list, varied topics ('stress' is a broad term)
- Curation essentially now complete for most data types
- Expanding beyond AFP?
- Chris will draw up preliminary tree of topics and send around
- We can discuss, edit, and expand as a group
- We want to 1) Collect positive and negative training papers and 2) Manually generate a list of key words to use for training
- Todd proposes for paper pages on WormBase:
- Show a table of flagged data types for a paper?
- Give users a sense of where paper is in the curation pipeline
August 8, 2013
New Spica now has a closed (private) 'citace' account
- citpub account is accessible to everyone with password
- People can create their own spica accounts
- Personal accounts are encouraged so as to avoid saving changes to citpub database
Worm Ontology Browser
- Raymond has set up a server
- Browsing should be faster now
- Should be transferable to the Amazon cloud
- Raymond will establish a WormBase development environment
- Paper categorization
- Depth vs breadth of topics (number of papers?)
- 'Stress' has been a pilot topic, but is a very broad topic
- Will work on generating subcategories of 'Stress' on the Paper Categorization Wiki page
- Curators can analyze the Author First Pass list of 'Stress' papers as well as entire backlog/corpus
- Goals of 'covering' a topic?
- 'Complete' and vetted process page, Wikipathway
- Promote 'featured processes' on WormBase for a given release
- We should collect positive and negative papers (for a given topic) for SVM training
Curators should check CGIs
- Submission forms and other CGIs may have been altered (only in the publicly accessible "azurebrd" account, you can see it in the URL)
August 15, 2013
We have geneace and can get data for nightly dump from geneace from Michael P
Raymond, Kimberly, Paul S
Raymond working on development environment. Will host server at Caltech for now. Need to have single backend. Depends on >32GB of ram. Will try to make less memory dependent.
Kimberly tells us about the new GO Model, will send us link to wiki.
New relationships in LEGO are not fully defined yet. Need an ontology editor being developed in the direction we want. obo to owl is defined.
Xiaodong, Daniela, Raymond, Wen meeting to discuss expression pattern curation.
Bottom Up. Look at data, see how to fit with existing structure into database. Want a Top Down based on the types of data and relationships, wi ll help us find holes in data modeling.
Daniela, Paul S, Raymond, Gary S
Contacted the people responsible, but the person on vacation until yesterday.
Do they have functional genotype data we don't have? Everything should be in, the paper has been curated, it's about having the movie files.
Are their pages useful? Not sure, just checked if we can take their movie files. Should we link to their pages? Igor and maybe Raymond talked about setting up the links. We can change the movie model and link through there. There is a database tag in the RNAi model. We don't have the links for other data, so we'd have to look.
Hinxton not going to work on all Tiling array nor RNA seq. Wen will work on biology part of experimental objects, push Hinxton to map to genes. Will Hinxton take on migration to build. Focus on what we can do.