Difference between revisions of "WBConfCall 2011.11.03-Agenda and Minutes"
From WormBaseWiki
Jump to navigationJump to searchm (Created page with 'WormBase Site-Wide Conference Call Meeting Minutes November 3rd, 2011 modENCODE Update - Mark from DCC - Status of most recent data freeze - Potentially 3 consortium wide paper…') |
m |
||
(2 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
modENCODE Update | modENCODE Update | ||
− | + | *Mark from DCC | |
− | + | *Status of most recent data freeze | |
− | + | *Potentially 3 consortium wide papers in works, involving flies, worms, humans: | |
− | 1- Gene regulation networks paper | + | **1- Gene regulation networks paper |
− | 2- Transcriptome paper | + | **2- Transcriptome paper |
− | 3- Chromatin paper | + | **3- Chromatin paper |
− | + | *Data providers trying to get data in | |
− | + | *Deadline was midnight on Monday (Halloween) | |
− | + | *409 submissions working on releasing (working on over next month); will then go into modMine | |
− | + | *New tracks will be on WormBase | |
− | + | *Lead project, potentially 38 datasets | |
− | + | *Snyder, Gerstein - chromatin/transcription factors | |
− | + | *Waterston couple hundred (~233) datasets being vetted: | |
− | + | **RNASeq datasets | |
− | + | **briggsae, remanei, japonica, brenneri alignments | |
− | + | **Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes | |
− | + | ***Spot checking number of cells at peak of time points? | |
− | + | ***Aligned with other data in WormBase, timing needs to be calibrated | |
− | + | ***Hopefully embryos accurately staged | |
− | + | ***Example, "2 hours" vs "80-cell stage" | |
− | + | **daf-2, him-8, fem-2, etc. mutants | |
− | + | **pathogen-infected strains | |
− | + | *March 31, 2012 last data will come in; 10 labs (data providers) no more funding | |
− | + | *Waterston et al will be providing data up to the very last day | |
− | + | *DCC is being funded longer (March 31, 2013?) | |
− | + | *DCC working on archiving the data and making data usable, accessible before closing shop | |
− | + | *DCC has to clean up after big papers; supplemental analyses that were not provided adequately | |
− | + | *Data analyses' protocols will be documented on DCC Wiki | |
Todd's Points | Todd's Points | ||
− | 1) Welcome Xiaoqi back to OCIR team! | + | *1) Welcome Xiaoqi back to OCIR team! |
− | 2) Reconsider strategy for genome Tiers | + | *2) Reconsider strategy for genome Tiers |
− | + | *Discontinuity between species with regards to the data we capture | |
− | + | *WormBase does not need to be C. elegans-specific | |
− | + | *User wanted to see C. angaria page, but no Gene IDs for angaria | |
− | + | *Pages should be the same across species | |
− | Couple possible options: | + | *Couple possible options: |
− | + | **1) Discard with Tiers strategy OR | |
− | + | **2) Keep it internal | |
− | + | *Can ACEDB handle the Tiers system? | |
− | + | *Maintaining identifiers between builds is a fairly major commitment | |
− | + | *Manual curation associated with gene identifiers | |
− | + | *What happens if we don't maintain unique identifiers? Versions, more digits on ID? | |
− | + | *Assemlies we have, each gene has an ID, just not WBGene IDs | |
− | + | *Treat a genome as just a genome | |
− | + | *Don't need to worry about conditional checks for each species | |
− | + | **OK for build and website? | |
− | + | **Website, yes | |
− | + | **Build - we don't have all the data for Tier III species like we do Tier II species, etc. | |
− | + | **Todd - Absence of data isn't a problem | |
− | + | *Curatorially keep separate, but from user perspective keep consistent | |
− | + | *Need to consider before majorly scaling up number of genomes | |
− | + | *Build may require more thought | |
− | + | *Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis | |
− | + | *Erich Schwarz planning to do a reassembly of C. angaria | |
− | + | *Brugia will get reassembled | |
− | + | *Tiering, to some degree, is just internal; have tried to keep Tier system away from users | |
− | + | *Non-core species, objects will not stick around in current state | |
− | + | *Generally, genomes will not be reanalyzed substantially; | |
− | Two threads of conversation | + | *Two threads of conversation |
− | 1) What does an identifier mean? Entering into contract with users when providing identifiers | + | **1) What does an identifier mean? Entering into contract with users when providing identifiers |
− | 2) Some genomes get identifiers, others don't; this should be addressed | + | **2) Some genomes get identifiers, others don't; this should be addressed |
− | + | *Virtually no work on the build side to add many genomes with GFF files, etc. | |
− | + | *How do we develop concise descriptions, build pathways without IDs? | |
− | + | *What do we plan to manually curate? | |
− | + | *May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)? | |
− | + | *We could: get from third parties, datasets already have identifiers | |
− | + | *Instead of applying WBGene IDs, use existing IDs from data providers | |
− | + | *Problem for new assemblies; actively reworking assemblies | |
− | + | *Part of data submission standards; if you re-annotate genome, mandate mapping IDs over? | |
− | + | *Stable genomes may not be an issue | |
− | + | *How onerous is remapping the genome? | |
− | + | *Add fine print to contract with community, requires user remapping | |
− | + | *Let gene ID perish? Keep sequence for gene model, but don't re-use it? |
Latest revision as of 20:57, 3 November 2011
WormBase Site-Wide Conference Call Meeting Minutes
November 3rd, 2011
modENCODE Update
- Mark from DCC
- Status of most recent data freeze
- Potentially 3 consortium wide papers in works, involving flies, worms, humans:
- 1- Gene regulation networks paper
- 2- Transcriptome paper
- 3- Chromatin paper
- Data providers trying to get data in
- Deadline was midnight on Monday (Halloween)
- 409 submissions working on releasing (working on over next month); will then go into modMine
- New tracks will be on WormBase
- Lead project, potentially 38 datasets
- Snyder, Gerstein - chromatin/transcription factors
- Waterston couple hundred (~233) datasets being vetted:
- RNASeq datasets
- briggsae, remanei, japonica, brenneri alignments
- Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes
- Spot checking number of cells at peak of time points?
- Aligned with other data in WormBase, timing needs to be calibrated
- Hopefully embryos accurately staged
- Example, "2 hours" vs "80-cell stage"
- daf-2, him-8, fem-2, etc. mutants
- pathogen-infected strains
- March 31, 2012 last data will come in; 10 labs (data providers) no more funding
- Waterston et al will be providing data up to the very last day
- DCC is being funded longer (March 31, 2013?)
- DCC working on archiving the data and making data usable, accessible before closing shop
- DCC has to clean up after big papers; supplemental analyses that were not provided adequately
- Data analyses' protocols will be documented on DCC Wiki
Todd's Points
- 1) Welcome Xiaoqi back to OCIR team!
- 2) Reconsider strategy for genome Tiers
- Discontinuity between species with regards to the data we capture
- WormBase does not need to be C. elegans-specific
- User wanted to see C. angaria page, but no Gene IDs for angaria
- Pages should be the same across species
- Couple possible options:
- 1) Discard with Tiers strategy OR
- 2) Keep it internal
- Can ACEDB handle the Tiers system?
- Maintaining identifiers between builds is a fairly major commitment
- Manual curation associated with gene identifiers
- What happens if we don't maintain unique identifiers? Versions, more digits on ID?
- Assemlies we have, each gene has an ID, just not WBGene IDs
- Treat a genome as just a genome
- Don't need to worry about conditional checks for each species
- OK for build and website?
- Website, yes
- Build - we don't have all the data for Tier III species like we do Tier II species, etc.
- Todd - Absence of data isn't a problem
- Curatorially keep separate, but from user perspective keep consistent
- Need to consider before majorly scaling up number of genomes
- Build may require more thought
- Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
- Erich Schwarz planning to do a reassembly of C. angaria
- Brugia will get reassembled
- Tiering, to some degree, is just internal; have tried to keep Tier system away from users
- Non-core species, objects will not stick around in current state
- Generally, genomes will not be reanalyzed substantially;
- Two threads of conversation
- 1) What does an identifier mean? Entering into contract with users when providing identifiers
- 2) Some genomes get identifiers, others don't; this should be addressed
- Virtually no work on the build side to add many genomes with GFF files, etc.
- How do we develop concise descriptions, build pathways without IDs?
- What do we plan to manually curate?
- May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
- We could: get from third parties, datasets already have identifiers
- Instead of applying WBGene IDs, use existing IDs from data providers
- Problem for new assemblies; actively reworking assemblies
- Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
- Stable genomes may not be an issue
- How onerous is remapping the genome?
- Add fine print to contract with community, requires user remapping
- Let gene ID perish? Keep sequence for gene model, but don't re-use it?