Converting GFF2 to GFF3

From WormBaseWiki
Jump to: navigation, search

GFF data is provided to CSHL as part of the Wormbase build. The data is provided in GFF2 format and converted to GFF3 at the CSHL side.

Conversion script: $WORMBASE/util/import_export/


./ elegansWSXXX.gff2 | gzip -c > elegansWSXXX.gff3.gz

TH: NOTE! This does not correspond with the actual script. The incantation is:

./ -gff elegansWSXXX.gff2 -species | gzip -c > elegansWSXXX.gff3.gz

GFF3 format enforces the use of an ontology to label feature types. The types of features are required to come from an ontology and the relationships between features (part_of relationships) must conform to the relationships set forth in the ontology.

The conversion script needs an ontology file for processing. As of this writing, the SOFA (Sequence Ontology Feature Annotation; ontology is used as the ontology file for GFF3 format. SOFA is a lightweight version of SO (Sequence Ontology). The two most recent releases of SOFA are available in CVS. Please note that there is recent discussion about transitioning to use of SO instead of SOFA for the GFF3 format. If such transition occurs, the code needs to be updated to take this into account.

SOFA releases: $WORMBASE/util/import_export/sofa.rev_*.obo

There is additional documentation that describes the features available in GFF3 files:

GFF3 features (C. elegans)
GFF3 features (C. briggsae)

The conversion process requires a GFF2 file and a SOFA ontology file (if not provided it defaults to the hardcoded current release). The script can handle both the C. elegans and C. briggsae GFF2 files. The species must be specified in command-line.