GFF3 features (C. elegans)
This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article.
The WormBase C. elegans GFF3 files can be downloaded from:
ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3
The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group.
Features
Protein Coding Genes
The top-level feature for protein-coding genes are:
Gene:Coding_transcript (ID)
Protein coding genes contain one (or many):
mRNA:Coding_transcript (ID,Parent,Note,cds,prediction_status,wormpep)
mRNAs consist of:
CDS (ID,Parent=mRNA ID,Note,status,wormpep) five_prime_UTR (ID,Parent=mRNA ID,Note,status,wormpep) three_prime_UTR (ID,Parent=mRNA ID,Note,status,wormpep)
Here's a full example, including alternative splicing in the UTR:
II Gene gene 7752566 7753442 . - . ID=000001;Name=WBGene00015062;Alias=trx-1 II Coding_transcript mRNA 7752566 7753442 . - . ID=000002;Name=B0228.5a;Parent=000001 II Coding_transcript three_prime_UTR 7752566 7752645 . - . ID=000008;Parent=000002 II Coding_transcript three_prime_UTR 7752700 7752745 . - . ID=000008;Parent=000002 II Coding_transcript CDS 7752746 7752814 . - 0 ID=000009;Parent=000002 II Coding_transcript CDS 7752869 7752940 . - 0 ID=000011;Parent=000002 II Coding_transcript CDS 7753212 7753418 . - 0 ID=000013;Parent=000002 II Coding_transcript five_prime_UTR 7753419 7753442 . - . ID=000014;Parent=000002
Display Notes
Glyph + intron exons + CDS gene works fails so_transcript fails works
Predicted and Retired Genes
Predicted and retired genes are denoted by changing the source of the CDS. SHOULD THIS BE THE SOURCE OF THE GENE?
CDS:GeneMarkHMM (ID) CDS:Genefinder (ID) CDS:history (ID) CDS:twinscan (ID)
- Non-coding genes (some predicted): a Gene feature and an ncRNA feature is included. The source field for this group varies as listed below.
Genes: gene:Non_coding_transcript (ID) gene:RNAz (ID) gene:miRNA (ID) gene:ncRNA (ID) gene:rRNA (ID) gene:scRNA (ID) gene:snRNA (ID) gene:snlRNA (ID) gene:snoRNA (ID) gene:tRNA (ID) gene:tRNAscan-SE-1.23 (ID) ncRNAs: ncRNA:Non_coding_transcript (ID,Parent,Note) ncRNA:RNAz (ID,Parent,Note) ncRNA:miRNA (ID,Parent,Note) ncRNA:ncRNA (ID,Parent,Note) ncRNA:rRNA (ID,Parent,Note) ncRNA:scRNA (ID,Parent,Note) ncRNA:snRNA (ID,Parent,Note) ncRNA:snlRNA (ID,Parent,Note) ncRNA:snoRNA (ID,Parent,Note) ncRNA:tRNA (ID,Parent,Note) ncRNA:tRNAscan-SE-1.23 (ID,Parent)
ncRNA features are parts of the Gene features.
The following features are parts of the mRNA features.
exon:Non_coding_transcript (Parent) exon:miRNA (Parent) exon:ncRNA (Parent) exon:rRNA (Parent) exon:scRNA (Parent) exon:snRNA (Parent) exon:snlRNA (Parent) exon:snoRNA (Parent) exon:tRNA (Parent) exon:tRNAscan-SE-1.23 (Parent)
intron:Coding_transcript (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_homology, confirmed_inconsistent,confirmed_unknown,confirmed_utr) intron:Non_coding_transcript (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_utr) intron:ncRNA (Parent,confirmed_est)
- Alignment features: These features consist of multiple parts unified by a common ID attribute.
EST_match:BLAT_EST_BEST (ID,Target) EST_match:BLAT_EST_OTHER (ID,Target) RNAi_reagent:RNAi_primary (ID,Target) RNAi_reagent:RNAi_secondary (ID,Target) cDNA_match:BLAT_mRNA_BEST (ID,Target) cDNA_match:BLAT_mRNA_OTHER (ID,Target) expressed_sequence_match:BLAT_OST_BEST (ID,Target) expressed_sequence_match:BLAT_OST_OTHER (ID,Target) nucleotide_match:BLAT_TC1_BEST (ID,Target) nucleotide_match:BLAT_TC1_OTHER (ID,Target) nucleotide_match:BLAT_ncRNA_BEST (ID,Target) nucleotide_match:BLAT_ncRNA_OTHER (ID,Target) nucleotide_match:TEC_RED (ID,Target) nucleotide_match:waba_coding (ID,Target) nucleotide_match:waba_strong (ID,Target) nucleotide_match:waba_weak (ID,Target) protein_match:wublastx (ID,Target) reagent:Expr_pattern (ID,Target) repeat_region:RepeatMasker (ID,Target) translated_nucleotide_match:BLAT_NEMATODE (ID,Target,species) translated_nucleotide_match:BLAT_NEMBASE (ID,Target,species) translated_nucleotide_match:BLAT_WASHU (ID,Target,species) translated_nucleotide_match:mass_spec_genome (ID,Target,Note,cds_matches,protein_matches,times_observed)
- Other: The remaining features are listed below.
PCR_product:GenePair_STS (pcr_product) PCR_product:Orfeome (amplified,pcr_product) PCR_product:Promoterome (pcr_product) SAGE_tag:SAGE_tag (count,gene,pseudogene,sequence,transcript) SAGE_tag:SAGE_tag_genomic_unique (count,gene,sequence) SAGE_tag:SAGE_tag_most_three_prime (count,gene,pseudogene,sequence,transcript) SAGE_tag:SAGE_tag_unambiguously_mapped (count,gene,pseudogene,sequence,transcript) SNP:Allele (rflp,status,variation) binding_site:PicTar (Note) binding_site:miRanda (Note) chromosome:Reference (ID,Name) clone_insert_end:misc_feature (clone) clone_insert_start:misc_feature (clone) complex_substitution:Allele (variation) deletion:Allele (variation) experimental_result_region:Expr_profile (expr_profile) experimental_result_region:cDNA_for_RNAi (sequence) gene:landmark (locus) insertion:Allele (variation) inverted_repeat:inverted (Note) oligo:misc_feature (placeholder_attribute) operon:operon (operon) polyA_signal_sequence:polyA_signal_sequence (feature) polyA_site:polyA_site (feature) pseudogene:Pseudogene (ID) pseudogene:history (ID) reagent:Oligo_set (oligo_set) region:Genbank (genbank) region:Genomic_canonical (Note,sequence) region:Link (sequence) region:Vancouver_fosmid (sequence) region:binding_site (feature) sequence_variant:Allele (variation) sequence_variant:misc_feature (allele) substitution:Allele (variation) tandem_repeat:tandem (Note) trans_splice_acceptor_site:SL1 (feature) trans_splice_acceptor_site:SL2 (feature) transposable_element:Transposon (transposon) transposable_element:Transposon_CDS (cds) transposable_element_insertion_site:Allele (variation) transposable_element_insertion_site:Mos_insertion_allele (variation)
GFF2 coverage
Key:
++ Migrated XX Bug to fix -- To handle
BY FEATURE ============================== XX.:ALLELE 29143 .:Clone_left_end 4420 .:Clone_right_end 3461 XX.:Sequence 3 XX.:intron 1763 XX.:oligo 46370 AUGUSTUS:CDS 145788 AUGUSTUS:five_prime_UTR 25766 AUGUSTUS:single 1066 AUGUSTUS:three_prime_UTR 24888 Allele:SNP 178067 Allele:complex_change_in_nucleotide_sequence 1817 Allele:deletion 5084 Allele:insertion 26 Allele:sequence_variant 4 Allele:substitution 3894 Allele:transposable_element_insertion_site 1533 BLAT_Caen_EST_BEST:expressed_sequence_match 359843 BLAT_Caen_EST_OTHER:expressed_sequence_match 2400258 BLAT_Caen_mRNA_BEST:expressed_sequence_match 1874 BLAT_Caen_mRNA_OTHER:expressed_sequence_match 2971 BLAT_EST_BEST:EST_match 1053771 BLAT_EST_OTHER:EST_match 515321 BLAT_NEMATODE:translated_nucleotide_match 1634503 BLAT_NEMBASE:translated_nucleotide_match 329574 BLAT_OST_BEST:expressed_sequence_match 106595 BLAT_OST_OTHER:expressed_sequence_match 100527 BLAT_RST_BEST:expressed_sequence_match 7408 BLAT_RST_OTHER:expressed_sequence_match 5563 BLAT_TC1_BEST:nucleotide_match 1057 BLAT_TC1_OTHER:nucleotide_match 3516 BLAT_WASHU:translated_nucleotide_match 584087 BLAT_mRNA_BEST:cDNA_match 20354 BLAT_mRNA_OTHER:cDNA_match 5411 BLAT_ncRNA_BEST:nucleotide_match 5973 BLAT_ncRNA_OTHER:nucleotide_match 566 CGH_allele:deletion 250 Chronogram:reagent 2020 Coding_transcript:Transcript 27997 Coding_transcript:coding_exon 179400 Coding_transcript:exon 185895 Coding_transcript:five_prime_UTR 20289 Coding_transcript:intron 157895 Coding_transcript:protein_coding_primary_transcript 3 Coding_transcript:three_prime_UTR 18212 Expr_pattern:reagent 2940 Expr_profile:experimental_result_region 17360 FGENESH:CDS 132760 Genbank:region 6534 GeneMarkHMM:CDS 24016 GeneMarkHMM:coding_exon 132967 GeneMarkHMM:exon 132967 GenePair_STS:PCR_product 46262 Genefinder:CDS 21182 Genefinder:coding_exon 133896 Genefinder:exon 133896 Genefinder:intron 112714 Genomic_canonical:region 3267 Link:region 25 Mos_insertion_allele:transposable_element_insertion_site 14305 Non_coding_transcript:exon 719 Non_coding_transcript:intron 616 Non_coding_transcript:nc_primary_transcript 103 Oligo_set:reagent 83900 Orfeome:PCR_product 19389 Promoterome:PCR_product 6598 Pseudogene:Pseudogene 1551 Pseudogene:exon 4580 Pseudogene:intron 3029 RNAi_primary:RNAi_reagent 155948 RNAi_secondary:RNAi_reagent 14141 RepeatMasker:repeat_region 117035 SAGE_tag:SAGE_tag 111039 SAGE_tag_genomic_unique:SAGE_tag 74833 SAGE_tag_most_three_prime:SAGE_tag 5355 SAGE_tag_unambiguously_mapped:SAGE_tag 69188 SL1:SL1_acceptor_site 7745 SL2:SL2_acceptor_site 2056 TEC_RED:nucleotide_match 8341 Transposon:transposable_element 111 Transposon_CDS:coding_exon 690 Transposon_CDS:exon 690 Transposon_CDS:intron 404 Transposon_CDS:transposable_element 286 Vancouver_fosmid:region 12874 WBPaper00032940|pmid19243610:DNAse_I_hypersensitivity 7095 binding_site:PicTar 11184 binding_site:binding_site 235 binding_site:miRanda 71796 binding_site_region:binding_site 4527 cDNA_for_RNAi:experimental_result_region 2828 curated:CDS 24114 curated:coding_exon 155909 curated:exon 155909 curated:gene 23833 curated:intron 131795 dust:low_complexity_region 189135 gene:gene 38211 gene:processed_transcript 72172 history:CDS 11285 history:Pseudogene 86 history:Transcript 71 history:coding_exon 82742 history:exon 83302 history:intron 71862 history:misc_feature 177 inverted:inverted_repeat 100620 jigsaw:CDS 20423 jigsaw:coding_exon 123207 jigsaw:exon 123207 jigsaw:intron 102784 landmark:gene 116 mGENE:CDS 126500 mGENE:five_prime_UTR 19316 mGENE:three_prime_UTR 20099 mSplicer_orf:CDS 26582 mSplicer_orf:coding_exon 168585 mSplicer_orf:exon 168585 mSplicer_transcript:CDS 26582 mSplicer_transcript:coding_exon 171688 mSplicer_transcript:exon 171688 mass_spec_genome:translated_nucleotide_match 136442 miRNA:exon 160 miRNA:miRNA_primary_transcript 160 nGASP:CDS 123207 nGASP:five_prime_UTR 18935 nGASP:three_prime_UTR 19882 ncRNA:RNAz 3672 ncRNA:exon 15480 ncRNA:intron 22 ncRNA:ncRNA_primary_transcript 15458 operon:operon 1148 polyA_signal_sequence:polyA_signal_sequence 2454 polyA_site:polyA_site 3028 rRNA:exon 22 rRNA:rRNA_primary_transcript 22 scRNA:exon 1 scRNA:scRNA_primary_transcript 1 snRNA:exon 99 snRNA:snRNA_primary_transcript 99 snlRNA:exon 4 snlRNA:snlRNA_primary_transcript 4 snoRNA:exon 139 snoRNA:snoRNA_primary_transcript 139 tRNA:exon 22 tRNA:tRNA_primary_transcript 22 tRNAscan-SE-1.23:exon 638 tRNAscan-SE-1.23:tRNA_primary_transcript 609 tandem:tandem_repeat 53032 twinscan:CDS 21681 twinscan:coding_exon 124073 twinscan:exon 124073 twinscan:intron 102393 waba_coding:nucleotide_match 306424 waba_strong:nucleotide_match 407017 waba_weak:nucleotide_match 970836 wublastx:protein_match 3059127 BY OCCURRENCE ============================== wublastx:protein_match 3059127 BLAT_Caen_EST_OTHER:expressed_sequence_match 2400258 BLAT_NEMATODE:translated_nucleotide_match 1634503 BLAT_EST_BEST:EST_match 1053771 waba_weak:nucleotide_match 970836 BLAT_WASHU:translated_nucleotide_match 584087 BLAT_EST_OTHER:EST_match 515321 waba_strong:nucleotide_match 407017 BLAT_Caen_EST_BEST:expressed_sequence_match 359843 BLAT_NEMBASE:translated_nucleotide_match 329574 waba_coding:nucleotide_match 306424 dust:low_complexity_region 189135 Coding_transcript:exon 185895 Coding_transcript:coding_exon 179400 Allele:SNP 178067 mSplicer_transcript:exon 171688 mSplicer_transcript:coding_exon 171688 mSplicer_orf:coding_exon 168585 mSplicer_orf:exon 168585 Coding_transcript:intron 157895 RNAi_primary:RNAi_reagent 155948 curated:exon 155909 curated:coding_exon 155909 AUGUSTUS:CDS 145788 mass_spec_genome:translated_nucleotide_match 136442 Genefinder:exon 133896 Genefinder:coding_exon 133896 GeneMarkHMM:coding_exon 132967 GeneMarkHMM:exon 132967 FGENESH:CDS 132760 curated:intron 131795 mGENE:CDS 126500 twinscan:coding_exon 124073 twinscan:exon 124073 nGASP:CDS 123207 jigsaw:coding_exon 123207 jigsaw:exon 123207 RepeatMasker:repeat_region 117035 Genefinder:intron 112714 SAGE_tag:SAGE_tag 111039 BLAT_OST_BEST:expressed_sequence_match 106595 jigsaw:intron 102784 twinscan:intron 102393 inverted:inverted_repeat 100620 BLAT_OST_OTHER:expressed_sequence_match 100527 Oligo_set:reagent 83900 history:exon 83302 history:coding_exon 82742 SAGE_tag_genomic_unique:SAGE_tag 74833 gene:processed_transcript 72172 history:intron 71862 binding_site:miRanda 71796 SAGE_tag_unambiguously_mapped:SAGE_tag 69188 tandem:tandem_repeat 53032 .:oligo 46370 GenePair_STS:PCR_product 46262 gene:gene 38211 .:ALLELE 29143 Coding_transcript:Transcript 27997 mSplicer_orf:CDS 26582 mSplicer_transcript:CDS 26582 AUGUSTUS:five_prime_UTR 25766 AUGUSTUS:three_prime_UTR 24888 curated:CDS 24114 GeneMarkHMM:CDS 24016 curated:gene 23833 twinscan:CDS 21681 Genefinder:CDS 21182 jigsaw:CDS 20423 BLAT_mRNA_BEST:cDNA_match 20354 Coding_transcript:five_prime_UTR 20289 mGENE:three_prime_UTR 20099 nGASP:three_prime_UTR 19882 Orfeome:PCR_product 19389 mGENE:five_prime_UTR 19316 nGASP:five_prime_UTR 18935 Coding_transcript:three_prime_UTR 18212 Expr_profile:experimental_result_region 17360 ncRNA:exon 15480 ncRNA:ncRNA_primary_transcript 15458 Mos_insertion_allele:transposable_element_insertion_site 14305 RNAi_secondary:RNAi_reagent 14141 Vancouver_fosmid:region 12874 history:CDS 11285 binding_site:PicTar 11184 TEC_RED:nucleotide_match 8341 SL1:SL1_acceptor_site 7745 BLAT_RST_BEST:expressed_sequence_match 7408 WBPaper00032940|pmid19243610:DNAse_I_hypersensitivity 7095 Promoterome:PCR_product 6598 Genbank:region 6534 BLAT_ncRNA_BEST:nucleotide_match 5973 BLAT_RST_OTHER:expressed_sequence_match 5563 BLAT_mRNA_OTHER:cDNA_match 5411 SAGE_tag_most_three_prime:SAGE_tag 5355 Allele:deletion 5084 Pseudogene:exon 4580 binding_site_region:binding_site 4527 .:Clone_left_end 4420 Allele:substitution 3894 ncRNA:RNAz 3672 BLAT_TC1_OTHER:nucleotide_match 3516 .:Clone_right_end 3461 Genomic_canonical:region 3267 Pseudogene:intron 3029 polyA_site:polyA_site 3028 BLAT_Caen_mRNA_OTHER:expressed_sequence_match 2971 Expr_pattern:reagent 2940 cDNA_for_RNAi:experimental_result_region 2828 polyA_signal_sequence:polyA_signal_sequence 2454 SL2:SL2_acceptor_site 2056 Chronogram:reagent 2020 BLAT_Caen_mRNA_BEST:expressed_sequence_match 1874 Allele:complex_change_in_nucleotide_sequence 1817 .:intron 1763 Pseudogene:Pseudogene 1551 Allele:transposable_element_insertion_site 1533 operon:operon 1148 AUGUSTUS:single 1066 BLAT_TC1_BEST:nucleotide_match 1057 Non_coding_transcript:exon 719 Transposon_CDS:coding_exon 690 Transposon_CDS:exon 690 tRNAscan-SE-1.23:exon 638 Non_coding_transcript:intron 616 tRNAscan-SE-1.23:tRNA_primary_transcript 609 BLAT_ncRNA_OTHER:nucleotide_match 566 Transposon_CDS:intron 404 Transposon_CDS:transposable_element 286 CGH_allele:deletion 250 binding_site:binding_site 235 history:misc_feature 177 miRNA:miRNA_primary_transcript 160 miRNA:exon 160 snoRNA:exon 139 snoRNA:snoRNA_primary_transcript 139 landmark:gene 116 Transposon:transposable_element 111 Non_coding_transcript:nc_primary_transcript 103 snRNA:exon 99 snRNA:snRNA_primary_transcript 99 history:Pseudogene 86 history:Transcript 71 Allele:insertion 26 Link:region 25 ncRNA:intron 22 rRNA:rRNA_primary_transcript 22 tRNA:exon 22 tRNA:tRNA_primary_transcript 22 rRNA:exon 22 snlRNA:exon 4 Allele:sequence_variant 4 snlRNA:snlRNA_primary_transcript 4 Coding_transcript:protein_coding_primary_transcript 3 .:Sequence 3 scRNA:exon 1 scRNA:scRNA_primary_transcript 1