Difference between revisions of "Genome Standards"

From WormBaseWiki
Jump to navigationJump to search
Line 59: Line 59:
=== Assembly statisitics ===
=== Assembly statisitics ===
Standard Draft:
Please give us the following information:
High-Quality Draft:
* Standard Draft:
Improved High-Quality Draft:
* High-Quality Draft:
Annotation-Directed Improvement:
* Improved High-Quality Draft:
Noncontiguous Finished:  
* Annotation-Directed Improvement:
* Noncontiguous Finished:
* Finished:
=== Attribution ===
=== Attribution ===

Revision as of 10:41, 18 May 2011

WormBase Genome Integration Standards


This is the list of criteria that a genome assembly/project should attain before WormBase can agree to integrate the Organism into the database.

Submission of genomes to the Wormbase database is a common requirement. This document is a guide for this who wish their nematode genomes to be included in Wormbase.

We are happy to accept data pre-publication so that we can prepare it ready for adding to the database when you publish.


Keep us informed of your progress

Contact us at help@wormbase.org early on in your project. We can then advise you about details and be prepared for your data when you publish it.

We like to be informed of upcoming genome submissions as early as possible, for example a typical lab that has contributed data is the Sanger Helminth group

They run a pipeline whereby there are 4 phases, we would normally integrate the data after their fourth phase

1) Production (X months) - No Interest for Wormbase
2) Finishing  (3 Months) - No Interest for Wormbase
3) Analysis   (3 months) - Wormbase are interested from this stage on - contact us at help@wormbase.org
2) Repeat - Finishing
3) Repeat - Analysis
      x n (This means that it can go through several rounds before it is published)
4) Publish and submit to ENA/GenBank when the genome/gene set meets their standards
   Confirm it meets the Wormbase standards for integration, add to Wormbase.

Submission to ENA/GenBank

Normally we expect the genome to be submitted to the public nucleotide databases.

We can then easily extract the data from the public databases and the assembly will be regarded as reasonably stable by the authors.

Gene model prediction

We prefer you to do some prediction of the gene models in your genome.

All gene models should be at least 3 amino acids in length - mainly for Blast analysis as this is the minumum word size.

Minimum N50

We do not require a minimum N50. If you are happy for your data to be publicly available from the ENA/GenBank database, then we will consider including it in WormBase, but we reserve the right to reject genomes that are too fragmented for us to be able to load into our data structures.

Assembly statisitics

Please give us the following information:

  • Standard Draft:
  • High-Quality Draft:
  • Improved High-Quality Draft:
  • Annotation-Directed Improvement:
  • Noncontiguous Finished:
  • Finished:


Primary Data Contact:

Primary Data Contact Email:

Project URL:

FTP Site:


Submitted Files

If the data cannot be submitted to ENA/GenBank, the data should be provided to Wormbase in standardized formats using the following conventions

species refers to the species name for example c_elegans.

  • Genomic Sequence
File format : FASTA
File name   : species.genome.fa
  • Conceptual transcripts (unspliced) - the gene models
File format : GFF2 or GFF3
File name   : species.gff2 or species.gff3

  • Genomic Features (optional but preferred)
File format : GFF2 or GFF3
File name   : species.gff2 or species.gff3 (these should be in the same file as the gene models)
  • AGP file (optional but preferred)
File format : AGP
File name   : species.agp
  • Contigs file (optional but preferred)
File format : FASTA
File name : species.contigs.fa

See Also

The Helminth Co-ordinator discusses some requirements for submitting genomes in the paper "Genome Project Standards in a New Era of Sequencing"