GFF3 features (C. elegans)

From WormBaseWiki
Revision as of 14:29, 24 September 2007 by Tharris (talk | contribs) (New page: This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article. The WormBase ''C...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article.

The WormBase C. elegans GFF3 files can be downloaded from:

ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3

The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group.

FEATURES

  • For coding genes, a gene feature, one or more mRNA features and one or more CDS features are included. The source field for these features is "Coding_transcript".
Gene: gene:Coding_transcript                             (ID)
mRNA: mRNA:Coding_transcript                             (ID,Parent,Note,cds,prediction_status,wormpep)
CDS : CDS:Coding_transcript                              (ID,Parent,Note,status,wormpep)

Both mRNAs and CDSs are parts of the Gene features. The CDS features consist of multiple parts and are unified by a common ID atribute.

The following features are parts of the mRNA features.

exon:            exon:Coding_transcript                  (Parent)
five_prime_UTR:  Coding_transcript                       (Parent)
three_prime_UTR: Coding_transcript                       (Parent)
  • Predicted and retired genes: a CDS feature is included. The CDS features consist of multiple parts and are unified by a common ID atribute. The source field for these features varies as listed below.
CDS:GeneMarkHMM                                    (ID)
CDS:Genefinder                                     (ID)
CDS:history                                        (ID)
CDS:twinscan                                       (ID)
  • Non-coding genes (some predicted): a Gene feature and an ncRNA feature is included. The source field for this group varies as listed below.
Genes:

gene:Non_coding_transcript                         (ID)
gene:RNAz                                          (ID)
gene:miRNA                                         (ID)
gene:ncRNA                                         (ID)
gene:rRNA                                          (ID)
gene:scRNA                                         (ID)
gene:snRNA                                         (ID)
gene:snlRNA                                        (ID)
gene:snoRNA                                        (ID)
gene:tRNA                                          (ID)
gene:tRNAscan-SE-1.23                              (ID)

ncRNAs:

ncRNA:Non_coding_transcript                        (ID,Parent,Note)
ncRNA:RNAz                                         (ID,Parent,Note)
ncRNA:miRNA                                        (ID,Parent,Note)
ncRNA:ncRNA                                        (ID,Parent,Note)
ncRNA:rRNA                                         (ID,Parent,Note)
ncRNA:scRNA                                        (ID,Parent,Note)
ncRNA:snRNA                                        (ID,Parent,Note)
ncRNA:snlRNA                                       (ID,Parent,Note)
ncRNA:snoRNA                                       (ID,Parent,Note)
ncRNA:tRNA                                         (ID,Parent,Note)
ncRNA:tRNAscan-SE-1.23                             (ID,Parent)

ncRNA features are parts of the Gene features.

The following features are parts of the mRNA features.

exon:Non_coding_transcript                         (Parent)
exon:miRNA                                         (Parent)
exon:ncRNA                                         (Parent)
exon:rRNA                                          (Parent)
exon:scRNA                                         (Parent)
exon:snRNA                                         (Parent)
exon:snlRNA                                        (Parent)
exon:snoRNA                                        (Parent)
exon:tRNA                                          (Parent)
exon:tRNAscan-SE-1.23                              (Parent)
intron:Coding_transcript                           (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_homology,
                                                    confirmed_inconsistent,confirmed_unknown,confirmed_utr)
intron:Non_coding_transcript                       (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_utr)
intron:ncRNA                                       (Parent,confirmed_est)
  • Alignment features: These features consist of multiple parts unified by a common ID attribute.
EST_match:BLAT_EST_BEST                            (ID,Target)
EST_match:BLAT_EST_OTHER                           (ID,Target)
RNAi_reagent:RNAi_primary                          (ID,Target)
RNAi_reagent:RNAi_secondary                        (ID,Target)
cDNA_match:BLAT_mRNA_BEST                          (ID,Target)
cDNA_match:BLAT_mRNA_OTHER                         (ID,Target)
expressed_sequence_match:BLAT_OST_BEST             (ID,Target)
expressed_sequence_match:BLAT_OST_OTHER            (ID,Target)
nucleotide_match:BLAT_TC1_BEST                     (ID,Target)
nucleotide_match:BLAT_TC1_OTHER                    (ID,Target)
nucleotide_match:BLAT_ncRNA_BEST                   (ID,Target)
nucleotide_match:BLAT_ncRNA_OTHER                  (ID,Target)
nucleotide_match:TEC_RED                           (ID,Target)
nucleotide_match:waba_coding                       (ID,Target)
nucleotide_match:waba_strong                       (ID,Target)
nucleotide_match:waba_weak                         (ID,Target)
protein_match:wublastx                             (ID,Target)
reagent:Expr_pattern                               (ID,Target)
repeat_region:RepeatMasker                         (ID,Target)
translated_nucleotide_match:BLAT_NEMATODE          (ID,Target,species)
translated_nucleotide_match:BLAT_NEMBASE           (ID,Target,species)
translated_nucleotide_match:BLAT_WASHU             (ID,Target,species)
translated_nucleotide_match:mass_spec_genome       (ID,Target,Note,cds_matches,protein_matches,times_observed)
  • Other: The remaining features are listed below.
PCR_product:GenePair_STS                           (pcr_product)
PCR_product:Orfeome                                (amplified,pcr_product)
PCR_product:Promoterome                            (pcr_product)
SAGE_tag:SAGE_tag                                  (count,gene,pseudogene,sequence,transcript)
SAGE_tag:SAGE_tag_genomic_unique                   (count,gene,sequence)
SAGE_tag:SAGE_tag_most_three_prime                 (count,gene,pseudogene,sequence,transcript)
SAGE_tag:SAGE_tag_unambiguously_mapped             (count,gene,pseudogene,sequence,transcript)
SNP:Allele                                         (rflp,status,variation)
binding_site:PicTar                                (Note)
binding_site:miRanda                               (Note)
chromosome:Reference                               (ID,Name)
clone_insert_end:misc_feature                      (clone)
clone_insert_start:misc_feature                    (clone)
complex_substitution:Allele                        (variation)
deletion:Allele                                    (variation)
experimental_result_region:Expr_profile            (expr_profile)
experimental_result_region:cDNA_for_RNAi           (sequence)
gene:landmark                                      (locus)
insertion:Allele                                   (variation)
inverted_repeat:inverted                           (Note)
oligo:misc_feature                                 (placeholder_attribute)
operon:operon                                      (operon)
polyA_signal_sequence:polyA_signal_sequence        (feature)
polyA_site:polyA_site                              (feature)
pseudogene:Pseudogene                              (ID)
pseudogene:history                                 (ID)
reagent:Oligo_set                                  (oligo_set)
region:Genbank                                     (genbank)
region:Genomic_canonical                           (Note,sequence)
region:Link                                        (sequence)
region:Vancouver_fosmid                            (sequence)
region:binding_site                                (feature)
sequence_variant:Allele                            (variation)
sequence_variant:misc_feature                      (allele)
substitution:Allele                                (variation)
tandem_repeat:tandem                               (Note)
trans_splice_acceptor_site:SL1                     (feature)
trans_splice_acceptor_site:SL2                     (feature)
transposable_element:Transposon                    (transposon)
transposable_element:Transposon_CDS                (cds)
transposable_element_insertion_site:Allele         (variation)
transposable_element_insertion_site:Mos_insertion_allele (variation)

CHANGES

WS177 - add more attributes, add pseudogenes, introns
WS176 - add Parent attribute to CDS:Coding_transcript
WS174 - start documentation