Difference between revisions of "WBConfCall 2011.11.03-Agenda and Minutes"
m (Created page with 'WormBase Site-Wide Conference Call Meeting Minutes November 3rd, 2011 modENCODE Update - Mark from DCC - Status of most recent data freeze - Potentially 3 consortium wide paper…') |
m |
||
Line 4: | Line 4: | ||
modENCODE Update | modENCODE Update | ||
− | + | *Mark from DCC | |
− | + | *Status of most recent data freeze | |
− | + | *Potentially 3 consortium wide papers in works, involving flies, worms, humans: | |
1- Gene regulation networks paper | 1- Gene regulation networks paper | ||
Line 12: | Line 12: | ||
3- Chromatin paper | 3- Chromatin paper | ||
− | + | *Data providers trying to get data in | |
− | + | *Deadline was midnight on Monday (Halloween) | |
− | + | *409 submissions working on releasing (working on over next month); will then go into modMine | |
− | + | *New tracks will be on WormBase | |
− | + | *Lead project, potentially 38 datasets | |
− | + | *Snyder, Gerstein - chromatin/transcription factors | |
− | + | *Waterston couple hundred (~233) datasets being vetted: | |
− | * RNASeq datasets | + | **RNASeq datasets |
− | * briggsae, remanei, japonica, brenneri alignments | + | **briggsae, remanei, japonica, brenneri alignments |
− | * Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes | + | **Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes |
− | ** Spot checking number of cells at peak of time points? | + | ***Spot checking number of cells at peak of time points? |
− | ** Aligned with other data in WormBase, timing needs to be calibrated | + | ***Aligned with other data in WormBase, timing needs to be calibrated |
− | ** Hopefully embryos accurately staged | + | ***Hopefully embryos accurately staged |
− | ** Example, "2 hours" vs "80-cell stage" | + | ***Example, "2 hours" vs "80-cell stage" |
− | *daf-2, him-8, fem-2, etc. mutants | + | **daf-2, him-8, fem-2, etc. mutants |
− | *pathogen-infected strains | + | **pathogen-infected strains |
− | + | *March 31, 2012 last data will come in; 10 labs (data providers) no more funding | |
− | + | *Waterston et al will be providing data up to the very last day | |
− | + | *DCC is being funded longer (March 31, 2013?) | |
− | + | *DCC working on archiving the data and making data usable, accessible before closing shop | |
− | + | *DCC has to clean up after big papers; supplemental analyses that were not provided adequately | |
− | + | *Data analyses' protocols will be documented on DCC Wiki | |
Line 39: | Line 39: | ||
1) Welcome Xiaoqi back to OCIR team! | 1) Welcome Xiaoqi back to OCIR team! | ||
2) Reconsider strategy for genome Tiers | 2) Reconsider strategy for genome Tiers | ||
− | + | *Discontinuity between species with regards to the data we capture | |
− | + | *WormBase does not need to be C. elegans-specific | |
− | + | *User wanted to see C. angaria page, but no Gene IDs for angaria | |
− | + | *Pages should be the same across species | |
Couple possible options: | Couple possible options: | ||
Line 48: | Line 48: | ||
2) Keep it internal | 2) Keep it internal | ||
− | + | *Can ACEDB handle the Tiers system? | |
− | + | *Maintaining identifiers between builds is a fairly major commitment | |
− | + | *Manual curation associated with gene identifiers | |
− | + | *What happens if we don't maintain unique identifiers? Versions, more digits on ID? | |
− | + | *Assemlies we have, each gene has an ID, just not WBGene IDs | |
− | + | *Treat a genome as just a genome | |
− | + | *Don't need to worry about conditional checks for each species | |
− | * OK for build and website? | + | **OK for build and website? |
− | * Website, yes | + | **Website, yes |
− | * Build - we don't have all the data for Tier III species like we do Tier II species, etc. | + | **Build - we don't have all the data for Tier III species like we do Tier II species, etc. |
− | * Todd - Absence of data isn't a problem | + | **Todd - Absence of data isn't a problem |
− | + | *Curatorially keep separate, but from user perspective keep consistent | |
− | + | *Need to consider before majorly scaling up number of genomes | |
− | + | *Build may require more thought | |
− | + | *Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis | |
− | + | *Erich Schwarz planning to do a reassembly of C. angaria | |
− | + | *Brugia will get reassembled | |
− | + | *Tiering, to some degree, is just internal; have tried to keep Tier system away from users | |
− | + | *Non-core species, objects will not stick around in current state | |
− | + | *Generally, genomes will not be reanalyzed substantially; | |
Two threads of conversation | Two threads of conversation | ||
Line 74: | Line 74: | ||
2) Some genomes get identifiers, others don't; this should be addressed | 2) Some genomes get identifiers, others don't; this should be addressed | ||
− | + | *Virtually no work on the build side to add many genomes with GFF files, etc. | |
− | + | *How do we develop concise descriptions, build pathways without IDs? | |
− | + | *What do we plan to manually curate? | |
− | + | *May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)? | |
− | + | *We could: get from third parties, datasets already have identifiers | |
− | + | *Instead of applying WBGene IDs, use existing IDs from data providers | |
− | + | *Problem for new assemblies; actively reworking assemblies | |
− | + | *Part of data submission standards; if you re-annotate genome, mandate mapping IDs over? | |
− | + | *Stable genomes may not be an issue | |
− | + | *How onerous is remapping the genome? | |
− | + | *Add fine print to contract with community, requires user remapping | |
− | + | *Let gene ID perish? Keep sequence for gene model, but don't re-use it? |
Revision as of 20:56, 3 November 2011
WormBase Site-Wide Conference Call Meeting Minutes
November 3rd, 2011
modENCODE Update
- Mark from DCC
- Status of most recent data freeze
- Potentially 3 consortium wide papers in works, involving flies, worms, humans:
1- Gene regulation networks paper 2- Transcriptome paper 3- Chromatin paper
- Data providers trying to get data in
- Deadline was midnight on Monday (Halloween)
- 409 submissions working on releasing (working on over next month); will then go into modMine
- New tracks will be on WormBase
- Lead project, potentially 38 datasets
- Snyder, Gerstein - chromatin/transcription factors
- Waterston couple hundred (~233) datasets being vetted:
**RNASeq datasets **briggsae, remanei, japonica, brenneri alignments **Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes ***Spot checking number of cells at peak of time points? ***Aligned with other data in WormBase, timing needs to be calibrated ***Hopefully embryos accurately staged ***Example, "2 hours" vs "80-cell stage" **daf-2, him-8, fem-2, etc. mutants **pathogen-infected strains
- March 31, 2012 last data will come in; 10 labs (data providers) no more funding
- Waterston et al will be providing data up to the very last day
- DCC is being funded longer (March 31, 2013?)
- DCC working on archiving the data and making data usable, accessible before closing shop
- DCC has to clean up after big papers; supplemental analyses that were not provided adequately
- Data analyses' protocols will be documented on DCC Wiki
Todd's Points
1) Welcome Xiaoqi back to OCIR team!
2) Reconsider strategy for genome Tiers
- Discontinuity between species with regards to the data we capture
- WormBase does not need to be C. elegans-specific
- User wanted to see C. angaria page, but no Gene IDs for angaria
- Pages should be the same across species
Couple possible options: 1) Discard with Tiers strategy OR 2) Keep it internal
- Can ACEDB handle the Tiers system?
- Maintaining identifiers between builds is a fairly major commitment
- Manual curation associated with gene identifiers
- What happens if we don't maintain unique identifiers? Versions, more digits on ID?
- Assemlies we have, each gene has an ID, just not WBGene IDs
- Treat a genome as just a genome
- Don't need to worry about conditional checks for each species
**OK for build and website? **Website, yes **Build - we don't have all the data for Tier III species like we do Tier II species, etc. **Todd - Absence of data isn't a problem
- Curatorially keep separate, but from user perspective keep consistent
- Need to consider before majorly scaling up number of genomes
- Build may require more thought
- Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
- Erich Schwarz planning to do a reassembly of C. angaria
- Brugia will get reassembled
- Tiering, to some degree, is just internal; have tried to keep Tier system away from users
- Non-core species, objects will not stick around in current state
- Generally, genomes will not be reanalyzed substantially;
Two threads of conversation 1) What does an identifier mean? Entering into contract with users when providing identifiers 2) Some genomes get identifiers, others don't; this should be addressed
- Virtually no work on the build side to add many genomes with GFF files, etc.
- How do we develop concise descriptions, build pathways without IDs?
- What do we plan to manually curate?
- May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
- We could: get from third parties, datasets already have identifiers
- Instead of applying WBGene IDs, use existing IDs from data providers
- Problem for new assemblies; actively reworking assemblies
- Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
- Stable genomes may not be an issue
- How onerous is remapping the genome?
- Add fine print to contract with community, requires user remapping
- Let gene ID perish? Keep sequence for gene model, but don't re-use it?