WBConfCall 2011.11.03-Agenda and Minutes
From WormBaseWiki
Jump to navigationJump to searchWormBase Site-Wide Conference Call Meeting Minutes
November 3rd, 2011
modENCODE Update
- Mark from DCC
- Status of most recent data freeze
- Potentially 3 consortium wide papers in works, involving flies, worms, humans:
- 1- Gene regulation networks paper
- 2- Transcriptome paper
- 3- Chromatin paper
- Data providers trying to get data in
- Deadline was midnight on Monday (Halloween)
- 409 submissions working on releasing (working on over next month); will then go into modMine
- New tracks will be on WormBase
- Lead project, potentially 38 datasets
- Snyder, Gerstein - chromatin/transcription factors
- Waterston couple hundred (~233) datasets being vetted:
- RNASeq datasets
- briggsae, remanei, japonica, brenneri alignments
- Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes
- Spot checking number of cells at peak of time points?
- Aligned with other data in WormBase, timing needs to be calibrated
- Hopefully embryos accurately staged
- Example, "2 hours" vs "80-cell stage"
- daf-2, him-8, fem-2, etc. mutants
- pathogen-infected strains
- March 31, 2012 last data will come in; 10 labs (data providers) no more funding
- Waterston et al will be providing data up to the very last day
- DCC is being funded longer (March 31, 2013?)
- DCC working on archiving the data and making data usable, accessible before closing shop
- DCC has to clean up after big papers; supplemental analyses that were not provided adequately
- Data analyses' protocols will be documented on DCC Wiki
Todd's Points
- 1) Welcome Xiaoqi back to OCIR team!
- 2) Reconsider strategy for genome Tiers
- Discontinuity between species with regards to the data we capture
- WormBase does not need to be C. elegans-specific
- User wanted to see C. angaria page, but no Gene IDs for angaria
- Pages should be the same across species
- Couple possible options:
- 1) Discard with Tiers strategy OR
- 2) Keep it internal
- Can ACEDB handle the Tiers system?
- Maintaining identifiers between builds is a fairly major commitment
- Manual curation associated with gene identifiers
- What happens if we don't maintain unique identifiers? Versions, more digits on ID?
- Assemlies we have, each gene has an ID, just not WBGene IDs
- Treat a genome as just a genome
- Don't need to worry about conditional checks for each species
- OK for build and website?
- Website, yes
- Build - we don't have all the data for Tier III species like we do Tier II species, etc.
- Todd - Absence of data isn't a problem
- Curatorially keep separate, but from user perspective keep consistent
- Need to consider before majorly scaling up number of genomes
- Build may require more thought
- Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
- Erich Schwarz planning to do a reassembly of C. angaria
- Brugia will get reassembled
- Tiering, to some degree, is just internal; have tried to keep Tier system away from users
- Non-core species, objects will not stick around in current state
- Generally, genomes will not be reanalyzed substantially;
- Two threads of conversation
- 1) What does an identifier mean? Entering into contract with users when providing identifiers
- 2) Some genomes get identifiers, others don't; this should be addressed
- Virtually no work on the build side to add many genomes with GFF files, etc.
- How do we develop concise descriptions, build pathways without IDs?
- What do we plan to manually curate?
- May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
- We could: get from third parties, datasets already have identifiers
- Instead of applying WBGene IDs, use existing IDs from data providers
- Problem for new assemblies; actively reworking assemblies
- Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
- Stable genomes may not be an issue
- How onerous is remapping the genome?
- Add fine print to contract with community, requires user remapping
- Let gene ID perish? Keep sequence for gene model, but don't re-use it?