WormBase-Caltech Weekly Calls November 2011
November 3, 2011
Gene ID mapping problems (as discussed in site-wide call)
- Does CIT need to worry about consistency of gene IDs over database builds?
- Hyper-linking entities in papers could be an issue
- Acquire an ID for every gene for every new genome?
- When mark a paper, separate into two different types of genes :stable vs unstable
- Stable genes map to stable ID
- Unstable genes map with a version number
- WormBase communicate/work with DCC personnel for data wrangling during post-DCC era?
Updating Predicted Gene Interactions
- Wei wei asking about how to update with WormBase
- WormBase predicted interaction objects can be updated simply via link to Gene Orienteer
- If we want a live interaction browser, we would need to pull the data into the build process
- Do we want a single object per pair of genes? One object per instance of evidence (per paper)?
November 10, 2011
Worm Publication is under review
NAR paper is published - has DOI
- Williams et al has to be added to bibliography
Citace, Development Release
- Cron-jobs/scrips used to update the database, but now requires manual manipulation
- Manual versus automated updating of database
- Need a plan for the future
- Migration to EBI affecting the process; will things change again
- Where will WS release files exist
- Collecting feedback from curators
- Curators need to review results
Motifs in WormBase
- Xiaodong working on it
- Discuss with Todd the integrated model of 4 types (physical, genetic, predicted, regulatory)
- Change WBID names (WBInteraction versus WBPhysicalInteraction, WBGeneticInteraction, WBRegulatoryInteraction, WBPredictedInteraction)
- Put together series of example data objects; send to Todd/Paul to test out
- Need Interaction_type tag to distinguish the 4 basic types
- Types will need to be parsed/extracted after the build for FTP download etc.
Human Disease Relevance Descriptions
- Started going in from last build
- Ranjana will discuss with Web Team
- Disease-paper connections
November, 17, 2011
Handling 50-100+ genomes
- Annotation, gene finding, storage, handling
- ~1 week to build all genome browsers and BLAT/BLAST servers
- Homology run takes longer; from Compara pipeline
- Species-by-species build processes?
- Lookup if genome in ACEDB; if not, look up gene models in GFF databases
- What is needed:
- Need eyes to look at current pages
- Suggestions as to what is needed on individual pages
- Currently not adding a lot of new features
- Data models browsable/searchable?
- If we provide AQL, need to provide data models
- Intermine will eventually make AQL obsolete(?)
- Run AQL queries from a local ACEDB instance for better performance (and fewer time outs)
- Time out limits currently set to 2 minutes on AQL queries
Expression pattern curation
- Confidence flags
- Ex: expressed in HSNs, in HSNL
- Annotate to parent term unless explicitly (and confidently) stated observed in child term/object
- Consistency of curation?
- Discussion between Raymond, Wen, Daniela, etc.
- Want list of ALL genes expressed in a cell (with some cutoff)
- Also want list of genes expressed ONLY in that cell and not others
- New technology generating paradigm shift; how to best represent data
- Categorize data sets (based on methods)
- Use evidence codes (text?) to categorize (ECO?)
- To pull out ncRNA genes and GO terms (Sarah Burge)
- Need to exclude protein-coding genes from query
GO Meeting (Kimberly)
- A lot of changes to be put in place
- Infrastructure, annotation changes - we'll need to sort out what this means for WormBase
- Annotation: "Annotation extension" column to add additional info
- Make explicit annotations for gene products
- Next meeting in February at Stanford (focused annotation meeting)
- Workout specifications for common annotation framework
- Are we switching over to BioMart-run WormMart?
- OICR crew working on new data-loading tool
- Update still pending...
- Check modmine (http://intermine.modencode.org/)
Reviews in on SVM paper
- Maybe done in 3 weeks
- Reviewer wanted more metrics