Difference between revisions of "GFF3 features (C. elegans)"

From WormBaseWiki
Jump to navigationJump to search
(New page: This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article. The WormBase ''C...)
 
Line 7: Line 7:
 
The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group.
 
The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group.
  
'''FEATURES'''
+
== Features ==
  
* For coding genes, a gene feature, one or more mRNA features and one or more CDS features are included. The source field for these features is "Coding_transcript".
+
==== Protein Coding Genes ====
  
Gene: gene:Coding_transcript                            (ID)
+
The top-level feature for protein-coding genes are:
  mRNA: mRNA:Coding_transcript                            (ID,Parent,Note,cds,prediction_status,wormpep)
+
  Gene:Coding_transcript       (ID)
CDS¬†: CDS:Coding_transcript                             (ID,Parent,Note,status,wormpep)
 
  
Both mRNAs and CDSs are parts of the Gene features. The CDS features consist of multiple parts and are unified by a common ID atribute.
+
Protein coding genes contain one (or many):
 +
mRNA:Coding_transcript        (ID,Parent,Note,cds,prediction_status,wormpep)
  
The following features are parts of the mRNA features.
 
  
  exon:            exon:Coding_transcript                  (Parent)
+
mRNAs consist of:
  five_prime_UTR:  Coding_transcript                      (Parent)
+
  CDS                          (ID,Parent=mRNA ID,Note,status,wormpep)
  three_prime_UTR: Coding_transcript                      (Parent)
+
  five_prime_UTR               (ID,Parent=mRNA ID,Note,status,wormpep)
 +
  three_prime_UTR               (ID,Parent=mRNA ID,Note,status,wormpep)
  
* Predicted and retired genes: a CDS feature is included. The CDS features consist of multiple parts and are unified by a common ID atribute. The source field for these features varies as listed below.
+
Here's a full example, including alternative splicing in the UTR:
 +
II      Gene    gene    7752566 7753442 .      -      .      ID=000001;Name=WBGene00015062;Alias=trx-1
 +
II      Coding_transcript      mRNA    7752566 7753442 .      -      .      ID=000002;Name=B0228.5a;Parent=000001
 +
II      Coding_transcript      three_prime_UTR 7752566 7752645 .      -      .      ID=000008;Parent=000002
 +
II      Coding_transcript      three_prime_UTR 7752700 7752745 .      -      .      ID=000008;Parent=000002
 +
II      Coding_transcript      CDS    7752746 7752814 .      -      0      ID=000009;Parent=000002
 +
II      Coding_transcript      CDS     7752869 7752940 .       -      0      ID=000011;Parent=000002
 +
II      Coding_transcript      CDS     7753212 7753418 .      -      0      ID=000013;Parent=000002
 +
II      Coding_transcript      five_prime_UTR  7753419 7753442 .      -      .      ID=000014;Parent=000002
 +
 
 +
==== Predicted and Retired Genes ====
 +
 
 +
Predicted and retired genes are denoted by changing the source of the CDS. ''SHOULD THIS BE THE SOURCE OF THE GENE?''
  
 
  CDS:GeneMarkHMM                                    (ID)
 
  CDS:GeneMarkHMM                                    (ID)

Revision as of 06:20, 4 November 2009

This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article.

The WormBase C. elegans GFF3 files can be downloaded from:

ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3

The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group.

Features

Protein Coding Genes

The top-level feature for protein-coding genes are:

Gene:Coding_transcript        (ID)

Protein coding genes contain one (or many):

mRNA:Coding_transcript        (ID,Parent,Note,cds,prediction_status,wormpep)


mRNAs consist of:

CDS                           (ID,Parent=mRNA ID,Note,status,wormpep)
five_prime_UTR                (ID,Parent=mRNA ID,Note,status,wormpep)
three_prime_UTR               (ID,Parent=mRNA ID,Note,status,wormpep)

Here's a full example, including alternative splicing in the UTR:

II      Gene    gene    7752566 7753442 .       -       .       ID=000001;Name=WBGene00015062;Alias=trx-1
II      Coding_transcript       mRNA    7752566 7753442 .       -       .       ID=000002;Name=B0228.5a;Parent=000001
II      Coding_transcript       three_prime_UTR 7752566 7752645 .       -       .       ID=000008;Parent=000002
II      Coding_transcript       three_prime_UTR 7752700 7752745 .       -       .       ID=000008;Parent=000002
II      Coding_transcript       CDS     7752746 7752814 .       -       0       ID=000009;Parent=000002
II      Coding_transcript       CDS     7752869 7752940 .       -       0       ID=000011;Parent=000002
II      Coding_transcript       CDS     7753212 7753418 .       -       0       ID=000013;Parent=000002
II      Coding_transcript       five_prime_UTR  7753419 7753442 .       -       .       ID=000014;Parent=000002

Predicted and Retired Genes

Predicted and retired genes are denoted by changing the source of the CDS. SHOULD THIS BE THE SOURCE OF THE GENE?

CDS:GeneMarkHMM                                    (ID)
CDS:Genefinder                                     (ID)
CDS:history                                        (ID)
CDS:twinscan                                       (ID)
  • Non-coding genes (some predicted): a Gene feature and an ncRNA feature is included. The source field for this group varies as listed below.
Genes:

gene:Non_coding_transcript                         (ID)
gene:RNAz                                          (ID)
gene:miRNA                                         (ID)
gene:ncRNA                                         (ID)
gene:rRNA                                          (ID)
gene:scRNA                                         (ID)
gene:snRNA                                         (ID)
gene:snlRNA                                        (ID)
gene:snoRNA                                        (ID)
gene:tRNA                                          (ID)
gene:tRNAscan-SE-1.23                              (ID)

ncRNAs:

ncRNA:Non_coding_transcript                        (ID,Parent,Note)
ncRNA:RNAz                                         (ID,Parent,Note)
ncRNA:miRNA                                        (ID,Parent,Note)
ncRNA:ncRNA                                        (ID,Parent,Note)
ncRNA:rRNA                                         (ID,Parent,Note)
ncRNA:scRNA                                        (ID,Parent,Note)
ncRNA:snRNA                                        (ID,Parent,Note)
ncRNA:snlRNA                                       (ID,Parent,Note)
ncRNA:snoRNA                                       (ID,Parent,Note)
ncRNA:tRNA                                         (ID,Parent,Note)
ncRNA:tRNAscan-SE-1.23                             (ID,Parent)

ncRNA features are parts of the Gene features.

The following features are parts of the mRNA features.

exon:Non_coding_transcript                         (Parent)
exon:miRNA                                         (Parent)
exon:ncRNA                                         (Parent)
exon:rRNA                                          (Parent)
exon:scRNA                                         (Parent)
exon:snRNA                                         (Parent)
exon:snlRNA                                        (Parent)
exon:snoRNA                                        (Parent)
exon:tRNA                                          (Parent)
exon:tRNAscan-SE-1.23                              (Parent)
intron:Coding_transcript                           (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_homology,
                                                    confirmed_inconsistent,confirmed_unknown,confirmed_utr)
intron:Non_coding_transcript                       (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_utr)
intron:ncRNA                                       (Parent,confirmed_est)
  • Alignment features: These features consist of multiple parts unified by a common ID attribute.
EST_match:BLAT_EST_BEST                            (ID,Target)
EST_match:BLAT_EST_OTHER                           (ID,Target)
RNAi_reagent:RNAi_primary                          (ID,Target)
RNAi_reagent:RNAi_secondary                        (ID,Target)
cDNA_match:BLAT_mRNA_BEST                          (ID,Target)
cDNA_match:BLAT_mRNA_OTHER                         (ID,Target)
expressed_sequence_match:BLAT_OST_BEST             (ID,Target)
expressed_sequence_match:BLAT_OST_OTHER            (ID,Target)
nucleotide_match:BLAT_TC1_BEST                     (ID,Target)
nucleotide_match:BLAT_TC1_OTHER                    (ID,Target)
nucleotide_match:BLAT_ncRNA_BEST                   (ID,Target)
nucleotide_match:BLAT_ncRNA_OTHER                  (ID,Target)
nucleotide_match:TEC_RED                           (ID,Target)
nucleotide_match:waba_coding                       (ID,Target)
nucleotide_match:waba_strong                       (ID,Target)
nucleotide_match:waba_weak                         (ID,Target)
protein_match:wublastx                             (ID,Target)
reagent:Expr_pattern                               (ID,Target)
repeat_region:RepeatMasker                         (ID,Target)
translated_nucleotide_match:BLAT_NEMATODE          (ID,Target,species)
translated_nucleotide_match:BLAT_NEMBASE           (ID,Target,species)
translated_nucleotide_match:BLAT_WASHU             (ID,Target,species)
translated_nucleotide_match:mass_spec_genome       (ID,Target,Note,cds_matches,protein_matches,times_observed)
  • Other: The remaining features are listed below.
PCR_product:GenePair_STS                           (pcr_product)
PCR_product:Orfeome                                (amplified,pcr_product)
PCR_product:Promoterome                            (pcr_product)
SAGE_tag:SAGE_tag                                  (count,gene,pseudogene,sequence,transcript)
SAGE_tag:SAGE_tag_genomic_unique                   (count,gene,sequence)
SAGE_tag:SAGE_tag_most_three_prime                 (count,gene,pseudogene,sequence,transcript)
SAGE_tag:SAGE_tag_unambiguously_mapped             (count,gene,pseudogene,sequence,transcript)
SNP:Allele                                         (rflp,status,variation)
binding_site:PicTar                                (Note)
binding_site:miRanda                               (Note)
chromosome:Reference                               (ID,Name)
clone_insert_end:misc_feature                      (clone)
clone_insert_start:misc_feature                    (clone)
complex_substitution:Allele                        (variation)
deletion:Allele                                    (variation)
experimental_result_region:Expr_profile            (expr_profile)
experimental_result_region:cDNA_for_RNAi           (sequence)
gene:landmark                                      (locus)
insertion:Allele                                   (variation)
inverted_repeat:inverted                           (Note)
oligo:misc_feature                                 (placeholder_attribute)
operon:operon                                      (operon)
polyA_signal_sequence:polyA_signal_sequence        (feature)
polyA_site:polyA_site                              (feature)
pseudogene:Pseudogene                              (ID)
pseudogene:history                                 (ID)
reagent:Oligo_set                                  (oligo_set)
region:Genbank                                     (genbank)
region:Genomic_canonical                           (Note,sequence)
region:Link                                        (sequence)
region:Vancouver_fosmid                            (sequence)
region:binding_site                                (feature)
sequence_variant:Allele                            (variation)
sequence_variant:misc_feature                      (allele)
substitution:Allele                                (variation)
tandem_repeat:tandem                               (Note)
trans_splice_acceptor_site:SL1                     (feature)
trans_splice_acceptor_site:SL2                     (feature)
transposable_element:Transposon                    (transposon)
transposable_element:Transposon_CDS                (cds)
transposable_element_insertion_site:Allele         (variation)
transposable_element_insertion_site:Mos_insertion_allele (variation)

CHANGES

WS177 - add more attributes, add pseudogenes, introns
WS176 - add Parent attribute to CDS:Coding_transcript
WS174 - start documentation