Difference between revisions of "WBConfCall 2011.11.03-Agenda and Minutes"

From WormBaseWiki
Jump to navigationJump to search
m (Created page with 'WormBase Site-Wide Conference Call Meeting Minutes November 3rd, 2011 modENCODE Update - Mark from DCC - Status of most recent data freeze - Potentially 3 consortium wide paper…')
 
m
Line 4: Line 4:
  
 
modENCODE Update
 
modENCODE Update
- Mark from DCC
+
*Mark from DCC
- Status of most recent data freeze
+
*Status of most recent data freeze
- Potentially 3 consortium wide papers in works, involving flies, worms, humans:
+
*Potentially 3 consortium wide papers in works, involving flies, worms, humans:
  
 
1- Gene regulation networks paper
 
1- Gene regulation networks paper
Line 12: Line 12:
 
3- Chromatin paper
 
3- Chromatin paper
  
- Data providers trying to get data in
+
*Data providers trying to get data in
- Deadline was midnight on Monday (Halloween)
+
*Deadline was midnight on Monday (Halloween)
- 409 submissions working on releasing (working on over next month); will then go into modMine
+
*409 submissions working on releasing (working on over next month); will then go into modMine
- New tracks will be on WormBase
+
*New tracks will be on WormBase
- Lead project, potentially 38 datasets
+
*Lead project, potentially 38 datasets
- Snyder, Gerstein - chromatin/transcription factors
+
*Snyder, Gerstein - chromatin/transcription factors
- Waterston couple hundred (~233) datasets being vetted:
+
*Waterston couple hundred (~233) datasets being vetted:
* RNASeq datasets
+
**RNASeq datasets
* briggsae, remanei, japonica, brenneri alignments
+
**briggsae, remanei, japonica, brenneri alignments
* Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes
+
**Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes
** Spot checking number of cells at peak of time points?
+
***Spot checking number of cells at peak of time points?
** Aligned with other data in WormBase, timing needs to be calibrated
+
***Aligned with other data in WormBase, timing needs to be calibrated
** Hopefully embryos accurately staged
+
***Hopefully embryos accurately staged
** Example, "2 hours" vs "80-cell stage"
+
***Example, "2 hours" vs "80-cell stage"
*daf-2, him-8, fem-2, etc. mutants
+
**daf-2, him-8, fem-2, etc. mutants
*pathogen-infected strains
+
**pathogen-infected strains
- March 31, 2012 last data will come in; 10 labs (data providers) no more funding
+
*March 31, 2012 last data will come in; 10 labs (data providers) no more funding
- Waterston et al will be providing data up to the very last day
+
*Waterston et al will be providing data up to the very last day
- DCC is being funded longer (March 31, 2013?)
+
*DCC is being funded longer (March 31, 2013?)
- DCC working on archiving the data and making data usable, accessible before closing shop
+
*DCC working on archiving the data and making data usable, accessible before closing shop
- DCC has to clean up after big papers; supplemental analyses that were not provided adequately
+
*DCC has to clean up after big papers; supplemental analyses that were not provided adequately
- Data analyses' protocols will be documented on DCC Wiki
+
*Data analyses' protocols will be documented on DCC Wiki
  
  
Line 39: Line 39:
 
1) Welcome Xiaoqi back to OCIR team!
 
1) Welcome Xiaoqi back to OCIR team!
 
2) Reconsider strategy for genome Tiers
 
2) Reconsider strategy for genome Tiers
- Discontinuity between species with regards to the data we capture
+
*Discontinuity between species with regards to the data we capture
- WormBase does not need to be C. elegans-specific
+
*WormBase does not need to be C. elegans-specific
- User wanted to see C. angaria page, but no Gene IDs for angaria
+
*User wanted to see C. angaria page, but no Gene IDs for angaria
- Pages should be the same across species
+
*Pages should be the same across species
  
 
Couple possible options:
 
Couple possible options:
Line 48: Line 48:
 
2) Keep it internal
 
2) Keep it internal
  
- Can ACEDB handle the Tiers system?
+
*Can ACEDB handle the Tiers system?
- Maintaining identifiers between builds is a fairly major commitment
+
*Maintaining identifiers between builds is a fairly major commitment
- Manual curation associated with gene identifiers
+
*Manual curation associated with gene identifiers
- What happens if we don't maintain unique identifiers? Versions, more digits on ID?
+
*What happens if we don't maintain unique identifiers? Versions, more digits on ID?
  
- Assemlies we have, each gene has an ID, just not WBGene IDs
+
*Assemlies we have, each gene has an ID, just not WBGene IDs
- Treat a genome as just a genome
+
*Treat a genome as just a genome
- Don't need to worry about conditional checks for each species
+
*Don't need to worry about conditional checks for each species
* OK for build and website?
+
**OK for build and website?
* Website, yes
+
**Website, yes
* Build - we don't have all the data for Tier III species like we do Tier II species, etc.
+
**Build - we don't have all the data for Tier III species like we do Tier II species, etc.
* Todd - Absence of data isn't a problem
+
**Todd - Absence of data isn't a problem
- Curatorially keep separate, but from user perspective keep consistent
+
*Curatorially keep separate, but from user perspective keep consistent
- Need to consider before majorly scaling up number of genomes
+
*Need to consider before majorly scaling up number of genomes
- Build may require more thought
+
*Build may require more thought
- Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
+
*Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
- Erich Schwarz planning to do a reassembly of C. angaria
+
*Erich Schwarz planning to do a reassembly of C. angaria
- Brugia will get reassembled
+
*Brugia will get reassembled
- Tiering , to some degree, is just internal; have tried to keep Tier system away from users
+
*Tiering, to some degree, is just internal; have tried to keep Tier system away from users
- Non-core species, objects will not stick around in current state
+
*Non-core species, objects will not stick around in current state
- Generally, genomes will not be reanalyzed substantially;
+
*Generally, genomes will not be reanalyzed substantially;
  
 
Two threads of conversation
 
Two threads of conversation
Line 74: Line 74:
 
2) Some genomes get identifiers, others don't; this should be addressed
 
2) Some genomes get identifiers, others don't; this should be addressed
  
- Virtually no work on the build side to add many genomes with GFF files, etc.
+
*Virtually no work on the build side to add many genomes with GFF files, etc.
- How do we develop concise descriptions, build pathways without IDs?
+
*How do we develop concise descriptions, build pathways without IDs?
- What do we plan to manually curate?
+
*What do we plan to manually curate?
- May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
+
*May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
  
- We could: get from third parties, datasets already have identifiers
+
*We could: get from third parties, datasets already have identifiers
- Instead of applying WBGene IDs, use existing IDs from data providers
+
*Instead of applying WBGene IDs, use existing IDs from data providers
- Problem for new assemblies; actively reworking assemblies
+
*Problem for new assemblies; actively reworking assemblies
- Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
+
*Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
- Stable genomes may not be an issue
+
*Stable genomes may not be an issue
- How onerous is remapping the genome?  
+
*How onerous is remapping the genome?  
- Add fine print to contract with community, requires user remapping
+
*Add fine print to contract with community, requires user remapping
- Let gene ID perish? Keep sequence for gene model, but don't re-use it?
+
*Let gene ID perish? Keep sequence for gene model, but don't re-use it?

Revision as of 20:56, 3 November 2011

WormBase Site-Wide Conference Call Meeting Minutes

November 3rd, 2011

modENCODE Update

  • Mark from DCC
  • Status of most recent data freeze
  • Potentially 3 consortium wide papers in works, involving flies, worms, humans:

1- Gene regulation networks paper 2- Transcriptome paper 3- Chromatin paper

  • Data providers trying to get data in
  • Deadline was midnight on Monday (Halloween)
  • 409 submissions working on releasing (working on over next month); will then go into modMine
  • New tracks will be on WormBase
  • Lead project, potentially 38 datasets
  • Snyder, Gerstein - chromatin/transcription factors
  • Waterston couple hundred (~233) datasets being vetted:

**RNASeq datasets **briggsae, remanei, japonica, brenneri alignments **Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes ***Spot checking number of cells at peak of time points? ***Aligned with other data in WormBase, timing needs to be calibrated ***Hopefully embryos accurately staged ***Example, "2 hours" vs "80-cell stage" **daf-2, him-8, fem-2, etc. mutants **pathogen-infected strains

  • March 31, 2012 last data will come in; 10 labs (data providers) no more funding
  • Waterston et al will be providing data up to the very last day
  • DCC is being funded longer (March 31, 2013?)
  • DCC working on archiving the data and making data usable, accessible before closing shop
  • DCC has to clean up after big papers; supplemental analyses that were not provided adequately
  • Data analyses' protocols will be documented on DCC Wiki


Todd's Points 1) Welcome Xiaoqi back to OCIR team! 2) Reconsider strategy for genome Tiers

  • Discontinuity between species with regards to the data we capture
  • WormBase does not need to be C. elegans-specific
  • User wanted to see C. angaria page, but no Gene IDs for angaria
  • Pages should be the same across species

Couple possible options: 1) Discard with Tiers strategy OR 2) Keep it internal

  • Can ACEDB handle the Tiers system?
  • Maintaining identifiers between builds is a fairly major commitment
  • Manual curation associated with gene identifiers
  • What happens if we don't maintain unique identifiers? Versions, more digits on ID?
  • Assemlies we have, each gene has an ID, just not WBGene IDs
  • Treat a genome as just a genome
  • Don't need to worry about conditional checks for each species

**OK for build and website? **Website, yes **Build - we don't have all the data for Tier III species like we do Tier II species, etc. **Todd - Absence of data isn't a problem

  • Curatorially keep separate, but from user perspective keep consistent
  • Need to consider before majorly scaling up number of genomes
  • Build may require more thought
  • Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
  • Erich Schwarz planning to do a reassembly of C. angaria
  • Brugia will get reassembled
  • Tiering, to some degree, is just internal; have tried to keep Tier system away from users
  • Non-core species, objects will not stick around in current state
  • Generally, genomes will not be reanalyzed substantially;

Two threads of conversation 1) What does an identifier mean? Entering into contract with users when providing identifiers 2) Some genomes get identifiers, others don't; this should be addressed

  • Virtually no work on the build side to add many genomes with GFF files, etc.
  • How do we develop concise descriptions, build pathways without IDs?
  • What do we plan to manually curate?
  • May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
  • We could: get from third parties, datasets already have identifiers
  • Instead of applying WBGene IDs, use existing IDs from data providers
  • Problem for new assemblies; actively reworking assemblies
  • Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
  • Stable genomes may not be an issue
  • How onerous is remapping the genome?
  • Add fine print to contract with community, requires user remapping
  • Let gene ID perish? Keep sequence for gene model, but don't re-use it?