WBConfCall 2011.11.03-Agenda and Minutes

From WormBaseWiki
Revision as of 20:57, 3 November 2011 by Cgrove (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

WormBase Site-Wide Conference Call Meeting Minutes

November 3rd, 2011

modENCODE Update

  • Mark from DCC
  • Status of most recent data freeze
  • Potentially 3 consortium wide papers in works, involving flies, worms, humans:
    • 1- Gene regulation networks paper
    • 2- Transcriptome paper
    • 3- Chromatin paper
  • Data providers trying to get data in
  • Deadline was midnight on Monday (Halloween)
  • 409 submissions working on releasing (working on over next month); will then go into modMine
  • New tracks will be on WormBase
  • Lead project, potentially 38 datasets
  • Snyder, Gerstein - chromatin/transcription factors
  • Waterston couple hundred (~233) datasets being vetted:
    • RNASeq datasets
    • briggsae, remanei, japonica, brenneri alignments
    • Transcript profiling of embryos before, now 4-cell stage and on, every 30 minutes
      • Spot checking number of cells at peak of time points?
      • Aligned with other data in WormBase, timing needs to be calibrated
      • Hopefully embryos accurately staged
      • Example, "2 hours" vs "80-cell stage"
    • daf-2, him-8, fem-2, etc. mutants
    • pathogen-infected strains
  • March 31, 2012 last data will come in; 10 labs (data providers) no more funding
  • Waterston et al will be providing data up to the very last day
  • DCC is being funded longer (March 31, 2013?)
  • DCC working on archiving the data and making data usable, accessible before closing shop
  • DCC has to clean up after big papers; supplemental analyses that were not provided adequately
  • Data analyses' protocols will be documented on DCC Wiki

Todd's Points

  • 1) Welcome Xiaoqi back to OCIR team!
  • 2) Reconsider strategy for genome Tiers
  • Discontinuity between species with regards to the data we capture
  • WormBase does not need to be C. elegans-specific
  • User wanted to see C. angaria page, but no Gene IDs for angaria
  • Pages should be the same across species
  • Couple possible options:
    • 1) Discard with Tiers strategy OR
    • 2) Keep it internal
  • Can ACEDB handle the Tiers system?
  • Maintaining identifiers between builds is a fairly major commitment
  • Manual curation associated with gene identifiers
  • What happens if we don't maintain unique identifiers? Versions, more digits on ID?
  • Assemlies we have, each gene has an ID, just not WBGene IDs
  • Treat a genome as just a genome
  • Don't need to worry about conditional checks for each species
    • OK for build and website?
    • Website, yes
    • Build - we don't have all the data for Tier III species like we do Tier II species, etc.
    • Todd - Absence of data isn't a problem
  • Curatorially keep separate, but from user perspective keep consistent
  • Need to consider before majorly scaling up number of genomes
  • Build may require more thought
  • Different data types will pop up for other species; eg. RNASeq data for other Caenorhabditis
  • Erich Schwarz planning to do a reassembly of C. angaria
  • Brugia will get reassembled
  • Tiering, to some degree, is just internal; have tried to keep Tier system away from users
  • Non-core species, objects will not stick around in current state
  • Generally, genomes will not be reanalyzed substantially;
  • Two threads of conversation
    • 1) What does an identifier mean? Entering into contract with users when providing identifiers
    • 2) Some genomes get identifiers, others don't; this should be addressed
  • Virtually no work on the build side to add many genomes with GFF files, etc.
  • How do we develop concise descriptions, build pathways without IDs?
  • What do we plan to manually curate?
  • May want to see C. elegans concise descriptions on other genomes gene pages (orthologs)?
  • We could: get from third parties, datasets already have identifiers
  • Instead of applying WBGene IDs, use existing IDs from data providers
  • Problem for new assemblies; actively reworking assemblies
  • Part of data submission standards; if you re-annotate genome, mandate mapping IDs over?
  • Stable genomes may not be an issue
  • How onerous is remapping the genome?
  • Add fine print to contract with community, requires user remapping
  • Let gene ID perish? Keep sequence for gene model, but don't re-use it?