WBConfCall 2011.11.03-Agenda and Minutes

From WormBaseWiki
Revision as of 20:54, 3 November 2011 by Cgrove (talk | contribs) (Created page with 'WormBase Site-Wide Conference Call Meeting Minutes November 3rd, 2011 modENCODE Update - Mark from DCC - Status of most recent data freeze - Potentially 3 consortium wide paper…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

WormBase Site-Wide Conference Call Meeting Minutes

November 3rd, 2011

modENCODE Update - Mark from DCC - Status of most recent data freeze - Potentially 3 consortium wide papers in works, involving flies, worms, humans:

1- Gene regulation networks paper 2- Transcriptome paper 3- Chromatin paper

- Data providers trying to get data in - Deadline was midnight on Monday (Halloween) - 409 submissions working on releasing (working on over next month); will then go into modMine - New tracks will be on WormBase - Lead project, potentially 38 datasets - Snyder, Gerstein - chromatin/transcription factors - Waterston couple hundred (~233) datasets being vetted: * RNASeq datasets * briggsae, remanei, japonica, brenneri alignments * Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes ** Spot checking number of cells at peak of time points? ** Aligned with other data in WormBase, timing needs to be calibrated ** Hopefully embryos accurately staged ** Example, "2 hours" vs "80-cell stage" *daf-2, him-8, fem-2, etc. mutants *pathogen-infected strains - March 31, 2012 last data will come in; 10 labs (data providers) no more funding - Waterston et al will be providing data up to the very last day - DCC is being funded longer (March 31, 2013?) - DCC working on archiving the data and making data usable, accessible before closing shop - DCC has to clean up after big papers; supplemental analyses that were not provided adequately - Data analyses' protocols will be documented on DCC Wiki


Todd's Points 1) Welcome Xiaoqi back to OCIR team! 2) Reconsider strategy for genome Tiers - Discontinuity between species with regards to the data we capture - WormBase does not need to be C. elegans-specific - User wanted to see C. angaria page, but no Gene IDs for angaria - Pages should be the same across species

Couple possible options: 1) Discard with Tiers strategy OR 2) Keep it internal

- Can ACEDB handle the Tiers system? - Maintaining identifiers between builds is a fairly major commitment - Manual curation associated with gene identifiers - What happens if we don't maintain unique identifiers? Versions, more digits on ID?

- Assemlies we have, each gene has an ID, just not WBGene IDs - Treat a genome as just a genome - Don't need to worry about conditional checks for each species * OK for build and website? * Website, yes * Build - we don't have all the data for Tier III species like we do Tier II species, etc. * Todd - Absence of data isn't a problem - Curatorially keep separate, but from user perspective keep consistent - Need to consider before majorly scaling up number of genomes - Build may require more thought - Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis - Erich Schwarz planning to do a reassembly of C. angaria - Brugia will get reassembled - Tiering , to some degree, is just internal; have tried to keep Tier system away from users - Non-core species, objects will not stick around in current state - Generally, genomes will not be reanalyzed substantially;

Two threads of conversation 1) What does an identifier mean? Entering into contract with users when providing identifiers 2) Some genomes get identifiers, others don't; this should be addressed

- Virtually no work on the build side to add many genomes with GFF files, etc. - How do we develop concise descriptions, build pathways without IDs? - What do we plan to manually curate? - May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?

- We could: get from third parties, datasets already have identifiers - Instead of applying WBGene IDs, use existing IDs from data providers - Problem for new assemblies; actively reworking assemblies - Part of data submission standards; if you re-annotate genome, mandate mapping IDs over? - Stable genomes may not be an issue - How onerous is remapping the genome? - Add fine print to contract with community, requires user remapping - Let gene ID perish? Keep sequence for gene model, but don't re-use it?