GFF3 features (C. elegans)

From WormBaseWiki
Jump to: navigation, search

This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article.

The WormBase C. elegans GFF3 files can be downloaded from:

ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3

The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group.

Features

Protein Coding Genes

The top-level feature for protein-coding genes are:

Gene:Coding_transcript        (ID)

Protein coding genes contain one (or many):

mRNA:Coding_transcript        (ID,Parent,Note,cds,prediction_status,wormpep)

mRNAs consist of:

CDS                           (ID,Parent=mRNA ID,Note,status,wormpep)
five_prime_UTR                (ID,Parent=mRNA ID,Note,status,wormpep)
three_prime_UTR               (ID,Parent=mRNA ID,Note,status,wormpep)

Here's a full example, including alternative splicing in the UTR:

II      Gene    gene    7752566 7753442 .       -       .       ID=000001;Name=WBGene00015062;Alias=trx-1
II      Coding_transcript       mRNA    7752566 7753442 .       -       .       ID=000002;Name=B0228.5a;Parent=000001
II      Coding_transcript       three_prime_UTR 7752566 7752645 .       -       .       ID=000008;Parent=000002
II      Coding_transcript       three_prime_UTR 7752700 7752745 .       -       .       ID=000008;Parent=000002
II      Coding_transcript       CDS     7752746 7752814 .       -       0       ID=000009;Parent=000002
II      Coding_transcript       CDS     7752869 7752940 .       -       0       ID=000011;Parent=000002
II      Coding_transcript       CDS     7753212 7753418 .       -       0       ID=000013;Parent=000002
II      Coding_transcript       five_prime_UTR  7753419 7753442 .       -       .       ID=000014;Parent=000002

Display Notes


Glyph            + intron    exons + CDS
 gene            works       fails
 so_transcript   fails       works

Predicted and Retired Genes

Predicted and retired genes are denoted by changing the source of the CDS. SHOULD THIS BE THE SOURCE OF THE GENE?

CDS:GeneMarkHMM                                    (ID)
CDS:Genefinder                                     (ID)
CDS:history                                        (ID)
CDS:twinscan                                       (ID)
  • Non-coding genes (some predicted): a Gene feature and an ncRNA feature is included. The source field for this group varies as listed below.
Genes:

gene:Non_coding_transcript                         (ID)
gene:RNAz                                          (ID)
gene:miRNA                                         (ID)
gene:ncRNA                                         (ID)
gene:rRNA                                          (ID)
gene:scRNA                                         (ID)
gene:snRNA                                         (ID)
gene:snlRNA                                        (ID)
gene:snoRNA                                        (ID)
gene:tRNA                                          (ID)
gene:tRNAscan-SE-1.23                              (ID)

ncRNAs:

ncRNA:Non_coding_transcript                        (ID,Parent,Note)
ncRNA:RNAz                                         (ID,Parent,Note)
ncRNA:miRNA                                        (ID,Parent,Note)
ncRNA:ncRNA                                        (ID,Parent,Note)
ncRNA:rRNA                                         (ID,Parent,Note)
ncRNA:scRNA                                        (ID,Parent,Note)
ncRNA:snRNA                                        (ID,Parent,Note)
ncRNA:snlRNA                                       (ID,Parent,Note)
ncRNA:snoRNA                                       (ID,Parent,Note)
ncRNA:tRNA                                         (ID,Parent,Note)
ncRNA:tRNAscan-SE-1.23                             (ID,Parent)

ncRNA features are parts of the Gene features.

The following features are parts of the mRNA features.

exon:Non_coding_transcript                         (Parent)
exon:miRNA                                         (Parent)
exon:ncRNA                                         (Parent)
exon:rRNA                                          (Parent)
exon:scRNA                                         (Parent)
exon:snRNA                                         (Parent)
exon:snlRNA                                        (Parent)
exon:snoRNA                                        (Parent)
exon:tRNA                                          (Parent)
exon:tRNAscan-SE-1.23                              (Parent)
intron:Coding_transcript                           (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_homology,
                                                    confirmed_inconsistent,confirmed_unknown,confirmed_utr)
intron:Non_coding_transcript                       (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_utr)
intron:ncRNA                                       (Parent,confirmed_est)
  • Alignment features: These features consist of multiple parts unified by a common ID attribute.
EST_match:BLAT_EST_BEST                            (ID,Target)
EST_match:BLAT_EST_OTHER                           (ID,Target)
RNAi_reagent:RNAi_primary                          (ID,Target)
RNAi_reagent:RNAi_secondary                        (ID,Target)
cDNA_match:BLAT_mRNA_BEST                          (ID,Target)
cDNA_match:BLAT_mRNA_OTHER                         (ID,Target)
expressed_sequence_match:BLAT_OST_BEST             (ID,Target)
expressed_sequence_match:BLAT_OST_OTHER            (ID,Target)
nucleotide_match:BLAT_TC1_BEST                     (ID,Target)
nucleotide_match:BLAT_TC1_OTHER                    (ID,Target)
nucleotide_match:BLAT_ncRNA_BEST                   (ID,Target)
nucleotide_match:BLAT_ncRNA_OTHER                  (ID,Target)
nucleotide_match:TEC_RED                           (ID,Target)
nucleotide_match:waba_coding                       (ID,Target)
nucleotide_match:waba_strong                       (ID,Target)
nucleotide_match:waba_weak                         (ID,Target)
protein_match:wublastx                             (ID,Target)
reagent:Expr_pattern                               (ID,Target)
repeat_region:RepeatMasker                         (ID,Target)
translated_nucleotide_match:BLAT_NEMATODE          (ID,Target,species)
translated_nucleotide_match:BLAT_NEMBASE           (ID,Target,species)
translated_nucleotide_match:BLAT_WASHU             (ID,Target,species)
translated_nucleotide_match:mass_spec_genome       (ID,Target,Note,cds_matches,protein_matches,times_observed)
  • Other: The remaining features are listed below.
PCR_product:GenePair_STS                           (pcr_product)
PCR_product:Orfeome                                (amplified,pcr_product)
PCR_product:Promoterome                            (pcr_product)
SAGE_tag:SAGE_tag                                  (count,gene,pseudogene,sequence,transcript)
SAGE_tag:SAGE_tag_genomic_unique                   (count,gene,sequence)
SAGE_tag:SAGE_tag_most_three_prime                 (count,gene,pseudogene,sequence,transcript)
SAGE_tag:SAGE_tag_unambiguously_mapped             (count,gene,pseudogene,sequence,transcript)
SNP:Allele                                         (rflp,status,variation)
binding_site:PicTar                                (Note)
binding_site:miRanda                               (Note)
chromosome:Reference                               (ID,Name)
clone_insert_end:misc_feature                      (clone)
clone_insert_start:misc_feature                    (clone)
complex_substitution:Allele                        (variation)
deletion:Allele                                    (variation)
experimental_result_region:Expr_profile            (expr_profile)
experimental_result_region:cDNA_for_RNAi           (sequence)
gene:landmark                                      (locus)
insertion:Allele                                   (variation)
inverted_repeat:inverted                           (Note)
oligo:misc_feature                                 (placeholder_attribute)
operon:operon                                      (operon)
polyA_signal_sequence:polyA_signal_sequence        (feature)
polyA_site:polyA_site                              (feature)
pseudogene:Pseudogene                              (ID)
pseudogene:history                                 (ID)
reagent:Oligo_set                                  (oligo_set)
region:Genbank                                     (genbank)
region:Genomic_canonical                           (Note,sequence)
region:Link                                        (sequence)
region:Vancouver_fosmid                            (sequence)
region:binding_site                                (feature)
sequence_variant:Allele                            (variation)
sequence_variant:misc_feature                      (allele)
substitution:Allele                                (variation)
tandem_repeat:tandem                               (Note)
trans_splice_acceptor_site:SL1                     (feature)
trans_splice_acceptor_site:SL2                     (feature)
transposable_element:Transposon                    (transposon)
transposable_element:Transposon_CDS                (cds)
transposable_element_insertion_site:Allele         (variation)
transposable_element_insertion_site:Mos_insertion_allele (variation)

GFF2 coverage

Key:

++ Migrated
XX Bug to fix
-- To handle
BY FEATURE
==============================
XX.:ALLELE	29143
.:Clone_left_end	4420
.:Clone_right_end	3461
XX.:Sequence	3
XX.:intron	1763
XX.:oligo	46370
AUGUSTUS:CDS	145788
AUGUSTUS:five_prime_UTR	25766
AUGUSTUS:single	1066
AUGUSTUS:three_prime_UTR	24888
Allele:SNP	178067
Allele:complex_change_in_nucleotide_sequence	1817
Allele:deletion	5084
Allele:insertion	26
Allele:sequence_variant	4
Allele:substitution	3894
Allele:transposable_element_insertion_site	1533
BLAT_Caen_EST_BEST:expressed_sequence_match	359843
BLAT_Caen_EST_OTHER:expressed_sequence_match	2400258
BLAT_Caen_mRNA_BEST:expressed_sequence_match	1874
BLAT_Caen_mRNA_OTHER:expressed_sequence_match	2971
BLAT_EST_BEST:EST_match	1053771
BLAT_EST_OTHER:EST_match	515321
BLAT_NEMATODE:translated_nucleotide_match	1634503
BLAT_NEMBASE:translated_nucleotide_match	329574
BLAT_OST_BEST:expressed_sequence_match	106595
BLAT_OST_OTHER:expressed_sequence_match	100527
BLAT_RST_BEST:expressed_sequence_match	7408
BLAT_RST_OTHER:expressed_sequence_match	5563
BLAT_TC1_BEST:nucleotide_match	1057
BLAT_TC1_OTHER:nucleotide_match	3516
BLAT_WASHU:translated_nucleotide_match	584087
BLAT_mRNA_BEST:cDNA_match	20354
BLAT_mRNA_OTHER:cDNA_match	5411
BLAT_ncRNA_BEST:nucleotide_match	5973
BLAT_ncRNA_OTHER:nucleotide_match	566
CGH_allele:deletion	250
Chronogram:reagent	2020
Coding_transcript:Transcript	27997
Coding_transcript:coding_exon	179400
Coding_transcript:exon	185895
Coding_transcript:five_prime_UTR	20289
Coding_transcript:intron	157895
Coding_transcript:protein_coding_primary_transcript	3
Coding_transcript:three_prime_UTR	18212
Expr_pattern:reagent	2940
Expr_profile:experimental_result_region	17360
FGENESH:CDS	132760
Genbank:region	6534
GeneMarkHMM:CDS	24016
GeneMarkHMM:coding_exon	132967
GeneMarkHMM:exon	132967
GenePair_STS:PCR_product	46262
Genefinder:CDS	21182
Genefinder:coding_exon	133896
Genefinder:exon	133896
Genefinder:intron	112714
Genomic_canonical:region	3267
Link:region	25
Mos_insertion_allele:transposable_element_insertion_site	14305
Non_coding_transcript:exon	719
Non_coding_transcript:intron	616
Non_coding_transcript:nc_primary_transcript	103
Oligo_set:reagent	83900
Orfeome:PCR_product	19389
Promoterome:PCR_product	6598
Pseudogene:Pseudogene	1551
Pseudogene:exon	4580
Pseudogene:intron	3029
RNAi_primary:RNAi_reagent	155948
RNAi_secondary:RNAi_reagent	14141
RepeatMasker:repeat_region	117035
SAGE_tag:SAGE_tag	111039
SAGE_tag_genomic_unique:SAGE_tag	74833
SAGE_tag_most_three_prime:SAGE_tag	5355
SAGE_tag_unambiguously_mapped:SAGE_tag	69188
SL1:SL1_acceptor_site	7745
SL2:SL2_acceptor_site	2056
TEC_RED:nucleotide_match	8341
Transposon:transposable_element	111
Transposon_CDS:coding_exon	690
Transposon_CDS:exon	690
Transposon_CDS:intron	404
Transposon_CDS:transposable_element	286
Vancouver_fosmid:region	12874
WBPaper00032940|pmid19243610:DNAse_I_hypersensitivity	7095
binding_site:PicTar	11184
binding_site:binding_site	235
binding_site:miRanda	71796
binding_site_region:binding_site	4527
cDNA_for_RNAi:experimental_result_region	2828
curated:CDS	24114
curated:coding_exon	155909
curated:exon	155909
curated:gene	23833
curated:intron	131795
dust:low_complexity_region	189135
gene:gene	38211
gene:processed_transcript	72172
history:CDS	11285
history:Pseudogene	86
history:Transcript	71
history:coding_exon	82742
history:exon	83302
history:intron	71862
history:misc_feature	177
inverted:inverted_repeat	100620
jigsaw:CDS	20423
jigsaw:coding_exon	123207
jigsaw:exon	123207
jigsaw:intron	102784
landmark:gene	116
mGENE:CDS	126500
mGENE:five_prime_UTR	19316
mGENE:three_prime_UTR	20099
mSplicer_orf:CDS	26582
mSplicer_orf:coding_exon	168585
mSplicer_orf:exon	168585
mSplicer_transcript:CDS	26582
mSplicer_transcript:coding_exon	171688
mSplicer_transcript:exon	171688
mass_spec_genome:translated_nucleotide_match	136442
miRNA:exon	160
miRNA:miRNA_primary_transcript	160
nGASP:CDS	123207
nGASP:five_prime_UTR	18935
nGASP:three_prime_UTR	19882
ncRNA:RNAz	3672
ncRNA:exon	15480
ncRNA:intron	22
ncRNA:ncRNA_primary_transcript	15458
operon:operon	1148
polyA_signal_sequence:polyA_signal_sequence	2454
polyA_site:polyA_site	3028
rRNA:exon	22
rRNA:rRNA_primary_transcript	22
scRNA:exon	1
scRNA:scRNA_primary_transcript	1
snRNA:exon	99
snRNA:snRNA_primary_transcript	99
snlRNA:exon	4
snlRNA:snlRNA_primary_transcript	4
snoRNA:exon	139
snoRNA:snoRNA_primary_transcript	139
tRNA:exon	22
tRNA:tRNA_primary_transcript	22
tRNAscan-SE-1.23:exon	638
tRNAscan-SE-1.23:tRNA_primary_transcript	609
tandem:tandem_repeat	53032
twinscan:CDS	21681
twinscan:coding_exon	124073
twinscan:exon	124073
twinscan:intron	102393
waba_coding:nucleotide_match	306424
waba_strong:nucleotide_match	407017
waba_weak:nucleotide_match	970836
wublastx:protein_match	3059127



BY OCCURRENCE
==============================
wublastx:protein_match	3059127
BLAT_Caen_EST_OTHER:expressed_sequence_match	2400258
BLAT_NEMATODE:translated_nucleotide_match	1634503
BLAT_EST_BEST:EST_match	1053771
waba_weak:nucleotide_match	970836
BLAT_WASHU:translated_nucleotide_match	584087
BLAT_EST_OTHER:EST_match	515321
waba_strong:nucleotide_match	407017
BLAT_Caen_EST_BEST:expressed_sequence_match	359843
BLAT_NEMBASE:translated_nucleotide_match	329574
waba_coding:nucleotide_match	306424
dust:low_complexity_region	189135
Coding_transcript:exon	185895
Coding_transcript:coding_exon	179400
Allele:SNP	178067
mSplicer_transcript:exon	171688
mSplicer_transcript:coding_exon	171688
mSplicer_orf:coding_exon	168585
mSplicer_orf:exon	168585
Coding_transcript:intron	157895
RNAi_primary:RNAi_reagent	155948
curated:exon	155909
curated:coding_exon	155909
AUGUSTUS:CDS	145788
mass_spec_genome:translated_nucleotide_match	136442
Genefinder:exon	133896
Genefinder:coding_exon	133896
GeneMarkHMM:coding_exon	132967
GeneMarkHMM:exon	132967
FGENESH:CDS	132760
curated:intron	131795
mGENE:CDS	126500
twinscan:coding_exon	124073
twinscan:exon	124073
nGASP:CDS	123207
jigsaw:coding_exon	123207
jigsaw:exon	123207
RepeatMasker:repeat_region	117035
Genefinder:intron	112714
SAGE_tag:SAGE_tag	111039
BLAT_OST_BEST:expressed_sequence_match	106595
jigsaw:intron	102784
twinscan:intron	102393
inverted:inverted_repeat	100620
BLAT_OST_OTHER:expressed_sequence_match	100527
Oligo_set:reagent	83900
history:exon	83302
history:coding_exon	82742
SAGE_tag_genomic_unique:SAGE_tag	74833
gene:processed_transcript	72172
history:intron	71862
binding_site:miRanda	71796
SAGE_tag_unambiguously_mapped:SAGE_tag	69188
tandem:tandem_repeat	53032
.:oligo	46370
GenePair_STS:PCR_product	46262
gene:gene	38211
.:ALLELE	29143
Coding_transcript:Transcript	27997
mSplicer_orf:CDS	26582
mSplicer_transcript:CDS	26582
AUGUSTUS:five_prime_UTR	25766
AUGUSTUS:three_prime_UTR	24888
curated:CDS	24114
GeneMarkHMM:CDS	24016
curated:gene	23833
twinscan:CDS	21681
Genefinder:CDS	21182
jigsaw:CDS	20423
BLAT_mRNA_BEST:cDNA_match	20354
Coding_transcript:five_prime_UTR	20289
mGENE:three_prime_UTR	20099
nGASP:three_prime_UTR	19882
Orfeome:PCR_product	19389
mGENE:five_prime_UTR	19316
nGASP:five_prime_UTR	18935
Coding_transcript:three_prime_UTR	18212
Expr_profile:experimental_result_region	17360
ncRNA:exon	15480
ncRNA:ncRNA_primary_transcript	15458
Mos_insertion_allele:transposable_element_insertion_site	14305
RNAi_secondary:RNAi_reagent	14141
Vancouver_fosmid:region	12874
history:CDS	11285
binding_site:PicTar	11184
TEC_RED:nucleotide_match	8341
SL1:SL1_acceptor_site	7745
BLAT_RST_BEST:expressed_sequence_match	7408
WBPaper00032940|pmid19243610:DNAse_I_hypersensitivity	7095
Promoterome:PCR_product	6598
Genbank:region	6534
BLAT_ncRNA_BEST:nucleotide_match	5973
BLAT_RST_OTHER:expressed_sequence_match	5563
BLAT_mRNA_OTHER:cDNA_match	5411
SAGE_tag_most_three_prime:SAGE_tag	5355
Allele:deletion	5084
Pseudogene:exon	4580
binding_site_region:binding_site	4527
.:Clone_left_end	4420
Allele:substitution	3894
ncRNA:RNAz	3672
BLAT_TC1_OTHER:nucleotide_match	3516
.:Clone_right_end	3461
Genomic_canonical:region	3267
Pseudogene:intron	3029
polyA_site:polyA_site	3028
BLAT_Caen_mRNA_OTHER:expressed_sequence_match	2971
Expr_pattern:reagent	2940
cDNA_for_RNAi:experimental_result_region	2828
polyA_signal_sequence:polyA_signal_sequence	2454
SL2:SL2_acceptor_site	2056
Chronogram:reagent	2020
BLAT_Caen_mRNA_BEST:expressed_sequence_match	1874
Allele:complex_change_in_nucleotide_sequence	1817
.:intron	1763
Pseudogene:Pseudogene	1551
Allele:transposable_element_insertion_site	1533
operon:operon	1148
AUGUSTUS:single	1066
BLAT_TC1_BEST:nucleotide_match	1057
Non_coding_transcript:exon	719
Transposon_CDS:coding_exon	690
Transposon_CDS:exon	690
tRNAscan-SE-1.23:exon	638
Non_coding_transcript:intron	616
tRNAscan-SE-1.23:tRNA_primary_transcript	609
BLAT_ncRNA_OTHER:nucleotide_match	566
Transposon_CDS:intron	404
Transposon_CDS:transposable_element	286
CGH_allele:deletion	250
binding_site:binding_site	235
history:misc_feature	177
miRNA:miRNA_primary_transcript	160
miRNA:exon	160
snoRNA:exon	139
snoRNA:snoRNA_primary_transcript	139
landmark:gene	116
Transposon:transposable_element	111
Non_coding_transcript:nc_primary_transcript	103
snRNA:exon	99
snRNA:snRNA_primary_transcript	99
history:Pseudogene	86
history:Transcript	71
Allele:insertion	26
Link:region	25
ncRNA:intron	22
rRNA:rRNA_primary_transcript	22
tRNA:exon	22
tRNA:tRNA_primary_transcript	22
rRNA:exon	22
snlRNA:exon	4
Allele:sequence_variant	4
snlRNA:snlRNA_primary_transcript	4
Coding_transcript:protein_coding_primary_transcript	3
.:Sequence	3
scRNA:exon	1
scRNA:scRNA_primary_transcript	1