Difference between revisions of "Genome Standards"

From WormBaseWiki
Jump to navigationJump to search
(→‎WormBase Genome Integration Standards: move the link to the paper to a new 'See Also' section at the end)
(→‎Overview:: use Sanger Helminth group as an example of a typical group)
Line 11: Line 11:
 
'''This is the list of criteria that a genome assembly/project should attain before WormBase can agree to integrate the Organism into the database.'''
 
'''This is the list of criteria that a genome assembly/project should attain before WormBase can agree to integrate the Organism into the database.'''
  
This topic has re-surfaced as SangerWB have been in contact with the [http://www.sanger.ac.uk/research/projects/parasitegenomics/ Sanger Helminth group] regarding future integration of their data.
+
Submission of genomes to the Wormbase database is a common requirement. This document is a guide for this who wish their nematode genomes to be included in Wormbase.
 +
 
 +
We like to be informed of upcoming genome submissions as early as possible, for example a typical lab that has contributed data is the [http://www.sanger.ac.uk/research/projects/parasitegenomics/ Sanger Helminth group]
  
 
They run a pipeline whereby there are 4 phases
 
They run a pipeline whereby there are 4 phases
  
  1) Production (X months) - No Interest for WB
+
  1) Production (X months) - No Interest for Wormbase
 
       |
 
       |
  2) Finishing  (3 Months) - No Interest for WB
+
  2) Finishing  (3 Months) - No Interest for Wormbase
 
       |
 
       |
 
  3) Analysis  (3 months) - Depending on our criteria we could be interested here
 
  3) Analysis  (3 months) - Depending on our criteria we could be interested here
Line 27: Line 29:
 
       x n (This means that it can go through several rounds before it is published)
 
       x n (This means that it can go through several rounds before it is published)
 
       |
 
       |
  4) Publish when the genome/gene set meets their standards  
+
  4) Publish when the genome/gene set meets their standards
 
 
The Helminth Co-ordinator is going to send some information regarding the standards they will be working to but these will be based on the paper [http://www.sciencemag.org/cgi/content/full/326/5950/236#R2 "Genome Project Standards in a New Era of Sequencing"] so this might be a good starting point for WormBase to base our list of criteria.....(It's not very detailed :( )
 
  
 
== Discussion ==
 
== Discussion ==

Revision as of 10:01, 18 May 2011

WormBase Genome Integration Standards

Overview:

This is the list of criteria that a genome assembly/project should attain before WormBase can agree to integrate the Organism into the database.

Submission of genomes to the Wormbase database is a common requirement. This document is a guide for this who wish their nematode genomes to be included in Wormbase.

We like to be informed of upcoming genome submissions as early as possible, for example a typical lab that has contributed data is the Sanger Helminth group

They run a pipeline whereby there are 4 phases

1) Production (X months) - No Interest for Wormbase
      |
2) Finishing  (3 Months) - No Interest for Wormbase
      |
3) Analysis   (3 months) - Depending on our criteria we could be interested here
      |
2) Repeat - Finishing
      |
3) Repeat - Analysis
      |
      x n (This means that it can go through several rounds before it is published)
      |
4) Publish when the genome/gene set meets their standards

Discussion

expected number of genes

To get an estimated of how many genes could be expected in an assembly, you could take the average gene length and see how many can be at maximum being predicted based on the available contigs (if you can come up with the intergenic percentage you can include that). Take this in correlation to a wild guess on the gene number (like 20k) and you got a % completeness (and a huge error bar).


WormBase Standards Document

Assembly statisitics

Standard Draft:

High-Quality Draft:

Improved High-Quality Draft:

Annotation-Directed Improvement:

Noncontiguous Finished:

Finished:

Attribution

Added by TH. Totally ad-hoc, but we should capture some information on who to contact so that we can display this on the species pages.

Primary Data Contact:

Primary Data Contact Email:

Project URL:

FTP Site:

Citation:

Gene Models

All gene models should be at least 3 amino acids in length - mainly for Blast analysis as this is the minumum word size.



Minimum Standards

  • Submission to a public Nucleotide Repository
  • Wiki description of the Species, submitted by data producer if possible.
  • N50 of ?

Core Files

Data should be provided in standardized formats using the following conventions

  • Genomic Sequence
File format : FASTA
File name   :
  • Conceptual transcripts (spliced)
File format : FASTA
File name   :
  • Conceptual transcripts (unspliced)
File format :
File name   : g_species.gff2
  • Conceptual translations
File format : FASTA
File name   :
  • Genomic Features
File format :
File name   : g_species.gff2

See Also

The Helminth Co-ordinator discusses some requirements for submitting genomes in the paper "Genome Project Standards in a New Era of Sequencing"