Difference between revisions of "WBConfCall 2011.11.03-Agenda and Minutes"

Revision as of 20:56, 3 November 2011

WormBase Site-Wide Conference Call Meeting Minutes

November 3rd, 2011

modENCODE Update

Mark from DCC
Status of most recent data freeze
Potentially 3 consortium wide papers in works, involving flies, worms, humans:

1- Gene regulation networks paper 2- Transcriptome paper 3- Chromatin paper

Data providers trying to get data in
Deadline was midnight on Monday (Halloween)
409 submissions working on releasing (working on over next month); will then go into modMine
New tracks will be on WormBase
Lead project, potentially 38 datasets
Snyder, Gerstein - chromatin/transcription factors
Waterston couple hundred (~233) datasets being vetted:

**RNASeq datasets **briggsae, remanei, japonica, brenneri alignments **Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes ***Spot checking number of cells at peak of time points? ***Aligned with other data in WormBase, timing needs to be calibrated ***Hopefully embryos accurately staged ***Example, "2 hours" vs "80-cell stage" **daf-2, him-8, fem-2, etc. mutants **pathogen-infected strains

March 31, 2012 last data will come in; 10 labs (data providers) no more funding
Waterston et al will be providing data up to the very last day
DCC is being funded longer (March 31, 2013?)
DCC working on archiving the data and making data usable, accessible before closing shop
DCC has to clean up after big papers; supplemental analyses that were not provided adequately
Data analyses' protocols will be documented on DCC Wiki

Todd's Points 1) Welcome Xiaoqi back to OCIR team! 2) Reconsider strategy for genome Tiers

Discontinuity between species with regards to the data we capture
WormBase does not need to be C. elegans-specific
User wanted to see C. angaria page, but no Gene IDs for angaria
Pages should be the same across species

Couple possible options: 1) Discard with Tiers strategy OR 2) Keep it internal

Can ACEDB handle the Tiers system?
Maintaining identifiers between builds is a fairly major commitment
Manual curation associated with gene identifiers
What happens if we don't maintain unique identifiers? Versions, more digits on ID?

Assemlies we have, each gene has an ID, just not WBGene IDs
Treat a genome as just a genome
Don't need to worry about conditional checks for each species

**OK for build and website? **Website, yes **Build - we don't have all the data for Tier III species like we do Tier II species, etc. **Todd - Absence of data isn't a problem

Curatorially keep separate, but from user perspective keep consistent
Need to consider before majorly scaling up number of genomes
Build may require more thought
Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
Erich Schwarz planning to do a reassembly of C. angaria
Brugia will get reassembled
Tiering, to some degree, is just internal; have tried to keep Tier system away from users
Non-core species, objects will not stick around in current state
Generally, genomes will not be reanalyzed substantially;

Two threads of conversation 1) What does an identifier mean? Entering into contract with users when providing identifiers 2) Some genomes get identifiers, others don't; this should be addressed

Virtually no work on the build side to add many genomes with GFF files, etc.
How do we develop concise descriptions, build pathways without IDs?
What do we plan to manually curate?
May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?

We could: get from third parties, datasets already have identifiers
Instead of applying WBGene IDs, use existing IDs from data providers
Problem for new assemblies; actively reworking assemblies
Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
Stable genomes may not be an issue
How onerous is remapping the genome?
Add fine print to contract with community, requires user remapping
Let gene ID perish? Keep sequence for gene model, but don't re-use it?

Difference between revisions of "WBConfCall 2011.11.03-Agenda and Minutes"

Revision as of 20:56, 3 November 2011

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 4: / Line 4: @@
 modENCODE Update
-- Mark from DCC
+*Mark from DCC
-- Status of most recent data freeze
+*Status of most recent data freeze
-- Potentially 3 consortium wide papers in works, involving flies, worms, humans:
+*Potentially 3 consortium wide papers in works, involving flies, worms, humans:
 - Gene regulation networks paper
@@ Line 12: / Line 12: @@
 - Chromatin paper
-- Data providers trying to get data in
+*Data providers trying to get data in
-- Deadline was midnight on Monday (Halloween)
+*Deadline was midnight on Monday (Halloween)
-- 409 submissions working on releasing (working on over next month); will then go into modMine
+*409 submissions working on releasing (working on over next month); will then go into modMine
-- New tracks will be on WormBase
+*New tracks will be on WormBase
-- Lead project, potentially 38 datasets
+*Lead project, potentially 38 datasets
-- Snyder, Gerstein - chromatin/transcription factors
+*Snyder, Gerstein - chromatin/transcription factors
-- Waterston couple hundred (~233) datasets being vetted:
+*Waterston couple hundred (~233) datasets being vetted:
-	* RNASeq datasets
+	**RNASeq datasets
-	* briggsae, remanei, japonica, brenneri alignments
+	**briggsae, remanei, japonica, brenneri alignments
-	* Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes
+	**Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes
-		** Spot checking number of cells at peak of time points?
+		***Spot checking number of cells at peak of time points?
-		** Aligned with other data in WormBase, timing needs to be calibrated
+		***Aligned with other data in WormBase, timing needs to be calibrated
-		** Hopefully embryos accurately staged
+		***Hopefully embryos accurately staged
-		** Example, "2 hours" vs "80-cell stage"
+		***Example, "2 hours" vs "80-cell stage"
-	*daf-2, him-8, fem-2, etc. mutants
+	**daf-2, him-8, fem-2, etc. mutants
-	*pathogen-infected strains
+	**pathogen-infected strains
-- March 31, 2012 last data will come in; 10 labs (data providers) no more funding
+*March 31, 2012 last data will come in; 10 labs (data providers) no more funding
-- Waterston et al will be providing data up to the very last day
+*Waterston et al will be providing data up to the very last day
-- DCC is being funded longer (March 31, 2013?)
+*DCC is being funded longer (March 31, 2013?)
-- DCC working on archiving the data and making data usable, accessible before closing shop
+*DCC working on archiving the data and making data usable, accessible before closing shop
-- DCC has to clean up after big papers; supplemental analyses that were not provided adequately
+*DCC has to clean up after big papers; supplemental analyses that were not provided adequately
-- Data analyses' protocols will be documented on DCC Wiki
+*Data analyses' protocols will be documented on DCC Wiki
@@ Line 39: / Line 39: @@
 ) Welcome Xiaoqi back to OCIR team!
 ) Reconsider strategy for genome Tiers
-- Discontinuity between species with regards to the data we capture
+*Discontinuity between species with regards to the data we capture
-- WormBase does not need to be C. elegans-specific
+*WormBase does not need to be C. elegans-specific
-- User wanted to see C. angaria page, but no Gene IDs for angaria
+*User wanted to see C. angaria page, but no Gene IDs for angaria
-- Pages should be the same across species
+*Pages should be the same across species
 Couple possible options:
@@ Line 48: / Line 48: @@
 ) Keep it internal
-- Can ACEDB handle the Tiers system?
+*Can ACEDB handle the Tiers system?
-- Maintaining identifiers between builds is a fairly major commitment
+*Maintaining identifiers between builds is a fairly major commitment
-- Manual curation associated with gene identifiers
+*Manual curation associated with gene identifiers
-- What happens if we don't maintain unique identifiers? Versions, more digits on ID?
+*What happens if we don't maintain unique identifiers? Versions, more digits on ID?
-- Assemlies we have, each gene has an ID, just not WBGene IDs
+*Assemlies we have, each gene has an ID, just not WBGene IDs
-- Treat a genome as just a genome
+*Treat a genome as just a genome
-- Don't need to worry about conditional checks for each species
+*Don't need to worry about conditional checks for each species
-	* OK for build and website?
+	**OK for build and website?
-	* Website, yes
+	**Website, yes
-	* Build - we don't have all the data for Tier III species like we do Tier II species, etc.
+	**Build - we don't have all the data for Tier III species like we do Tier II species, etc.
-	* Todd - Absence of data isn't a problem
+	**Todd - Absence of data isn't a problem
-- Curatorially keep separate, but from user perspective keep consistent
+*Curatorially keep separate, but from user perspective keep consistent
-- Need to consider before majorly scaling up number of genomes
+*Need to consider before majorly scaling up number of genomes
-- Build may require more thought
+*Build may require more thought
-- Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
+*Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
-- Erich Schwarz planning to do a reassembly of C. angaria
+*Erich Schwarz planning to do a reassembly of C. angaria
-- Brugia will get reassembled
+*Brugia will get reassembled
-- Tiering , to some degree, is just internal; have tried to keep Tier system away from users
+*Tiering, to some degree, is just internal; have tried to keep Tier system away from users
-- Non-core species, objects will not stick around in current state
+*Non-core species, objects will not stick around in current state
-- Generally, genomes will not be reanalyzed substantially;
+*Generally, genomes will not be reanalyzed substantially;
 Two threads of conversation
@@ Line 74: / Line 74: @@
 ) Some genomes get identifiers, others don't; this should be addressed
-- Virtually no work on the build side to add many genomes with GFF files, etc.
+*Virtually no work on the build side to add many genomes with GFF files, etc.
-- How do we develop concise descriptions, build pathways without IDs?
+*How do we develop concise descriptions, build pathways without IDs?
-- What do we plan to manually curate?
+*What do we plan to manually curate?
-- May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
+*May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
-- We could: get from third parties, datasets already have identifiers
+*We could: get from third parties, datasets already have identifiers
-- Instead of applying WBGene IDs, use existing IDs from data providers
+*Instead of applying WBGene IDs, use existing IDs from data providers
-- Problem for new assemblies; actively reworking assemblies
+*Problem for new assemblies; actively reworking assemblies
-- Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
+*Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
-- Stable genomes may not be an issue
+*Stable genomes may not be an issue
-- How onerous is remapping the genome?
+*How onerous is remapping the genome?
-- Add fine print to contract with community, requires user remapping
+*Add fine print to contract with community, requires user remapping
-- Let gene ID perish? Keep sequence for gene model, but don't re-use it?
+*Let gene ID perish? Keep sequence for gene model, but don't re-use it?