Difference between revisions of "GFF3 features (C. elegans)"
(New page: This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article. The WormBase ''C...) |
|||
Line 7: | Line 7: | ||
The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group. | The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group. | ||
− | + | == Features == | |
− | + | ==== Protein Coding Genes ==== | |
− | + | The top-level feature for protein-coding genes are: | |
− | + | Gene:Coding_transcript (ID) | |
− | |||
− | + | Protein coding genes contain one (or many): | |
+ | mRNA:Coding_transcript (ID,Parent,Note,cds,prediction_status,wormpep) | ||
− | |||
− | + | mRNAs consist of: | |
− | five_prime_UTR | + | CDS (ID,Parent=mRNA ID,Note,status,wormpep) |
− | three_prime_UTR | + | five_prime_UTR (ID,Parent=mRNA ID,Note,status,wormpep) |
+ | three_prime_UTR (ID,Parent=mRNA ID,Note,status,wormpep) | ||
− | + | Here's a full example, including alternative splicing in the UTR: | |
+ | II Gene gene 7752566 7753442 . - . ID=000001;Name=WBGene00015062;Alias=trx-1 | ||
+ | II Coding_transcript mRNA 7752566 7753442 . - . ID=000002;Name=B0228.5a;Parent=000001 | ||
+ | II Coding_transcript three_prime_UTR 7752566 7752645 . - . ID=000008;Parent=000002 | ||
+ | II Coding_transcript three_prime_UTR 7752700 7752745 . - . ID=000008;Parent=000002 | ||
+ | II Coding_transcript CDS 7752746 7752814 . - 0 ID=000009;Parent=000002 | ||
+ | II Coding_transcript CDS 7752869 7752940 . - 0 ID=000011;Parent=000002 | ||
+ | II Coding_transcript CDS 7753212 7753418 . - 0 ID=000013;Parent=000002 | ||
+ | II Coding_transcript five_prime_UTR 7753419 7753442 . - . ID=000014;Parent=000002 | ||
+ | |||
+ | ==== Predicted and Retired Genes ==== | ||
+ | |||
+ | Predicted and retired genes are denoted by changing the source of the CDS. ''SHOULD THIS BE THE SOURCE OF THE GENE?'' | ||
CDS:GeneMarkHMM (ID) | CDS:GeneMarkHMM (ID) |
Revision as of 06:20, 4 November 2009
This article describes features included in the C. elegans GFF3 files. For introduction information on the GFF3 files, please see GFF3 features article.
The WormBase C. elegans GFF3 files can be downloaded from:
ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3
The attribute tags (column 9) for each feature group are listed in parenthesis. Some attributes listed for a feature group may not be available for each feature within the group.
Features
Protein Coding Genes
The top-level feature for protein-coding genes are:
Gene:Coding_transcript (ID)
Protein coding genes contain one (or many):
mRNA:Coding_transcript (ID,Parent,Note,cds,prediction_status,wormpep)
mRNAs consist of:
CDS (ID,Parent=mRNA ID,Note,status,wormpep) five_prime_UTR (ID,Parent=mRNA ID,Note,status,wormpep) three_prime_UTR (ID,Parent=mRNA ID,Note,status,wormpep)
Here's a full example, including alternative splicing in the UTR:
II Gene gene 7752566 7753442 . - . ID=000001;Name=WBGene00015062;Alias=trx-1 II Coding_transcript mRNA 7752566 7753442 . - . ID=000002;Name=B0228.5a;Parent=000001 II Coding_transcript three_prime_UTR 7752566 7752645 . - . ID=000008;Parent=000002 II Coding_transcript three_prime_UTR 7752700 7752745 . - . ID=000008;Parent=000002 II Coding_transcript CDS 7752746 7752814 . - 0 ID=000009;Parent=000002 II Coding_transcript CDS 7752869 7752940 . - 0 ID=000011;Parent=000002 II Coding_transcript CDS 7753212 7753418 . - 0 ID=000013;Parent=000002 II Coding_transcript five_prime_UTR 7753419 7753442 . - . ID=000014;Parent=000002
Predicted and Retired Genes
Predicted and retired genes are denoted by changing the source of the CDS. SHOULD THIS BE THE SOURCE OF THE GENE?
CDS:GeneMarkHMM (ID) CDS:Genefinder (ID) CDS:history (ID) CDS:twinscan (ID)
- Non-coding genes (some predicted): a Gene feature and an ncRNA feature is included. The source field for this group varies as listed below.
Genes: gene:Non_coding_transcript (ID) gene:RNAz (ID) gene:miRNA (ID) gene:ncRNA (ID) gene:rRNA (ID) gene:scRNA (ID) gene:snRNA (ID) gene:snlRNA (ID) gene:snoRNA (ID) gene:tRNA (ID) gene:tRNAscan-SE-1.23 (ID) ncRNAs: ncRNA:Non_coding_transcript (ID,Parent,Note) ncRNA:RNAz (ID,Parent,Note) ncRNA:miRNA (ID,Parent,Note) ncRNA:ncRNA (ID,Parent,Note) ncRNA:rRNA (ID,Parent,Note) ncRNA:scRNA (ID,Parent,Note) ncRNA:snRNA (ID,Parent,Note) ncRNA:snlRNA (ID,Parent,Note) ncRNA:snoRNA (ID,Parent,Note) ncRNA:tRNA (ID,Parent,Note) ncRNA:tRNAscan-SE-1.23 (ID,Parent)
ncRNA features are parts of the Gene features.
The following features are parts of the mRNA features.
exon:Non_coding_transcript (Parent) exon:miRNA (Parent) exon:ncRNA (Parent) exon:rRNA (Parent) exon:scRNA (Parent) exon:snRNA (Parent) exon:snlRNA (Parent) exon:snoRNA (Parent) exon:tRNA (Parent) exon:tRNAscan-SE-1.23 (Parent)
intron:Coding_transcript (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_homology, confirmed_inconsistent,confirmed_unknown,confirmed_utr) intron:Non_coding_transcript (Parent,confirmed_cdna,confirmed_est,confirmed_false,confirmed_utr) intron:ncRNA (Parent,confirmed_est)
- Alignment features: These features consist of multiple parts unified by a common ID attribute.
EST_match:BLAT_EST_BEST (ID,Target) EST_match:BLAT_EST_OTHER (ID,Target) RNAi_reagent:RNAi_primary (ID,Target) RNAi_reagent:RNAi_secondary (ID,Target) cDNA_match:BLAT_mRNA_BEST (ID,Target) cDNA_match:BLAT_mRNA_OTHER (ID,Target) expressed_sequence_match:BLAT_OST_BEST (ID,Target) expressed_sequence_match:BLAT_OST_OTHER (ID,Target) nucleotide_match:BLAT_TC1_BEST (ID,Target) nucleotide_match:BLAT_TC1_OTHER (ID,Target) nucleotide_match:BLAT_ncRNA_BEST (ID,Target) nucleotide_match:BLAT_ncRNA_OTHER (ID,Target) nucleotide_match:TEC_RED (ID,Target) nucleotide_match:waba_coding (ID,Target) nucleotide_match:waba_strong (ID,Target) nucleotide_match:waba_weak (ID,Target) protein_match:wublastx (ID,Target) reagent:Expr_pattern (ID,Target) repeat_region:RepeatMasker (ID,Target) translated_nucleotide_match:BLAT_NEMATODE (ID,Target,species) translated_nucleotide_match:BLAT_NEMBASE (ID,Target,species) translated_nucleotide_match:BLAT_WASHU (ID,Target,species) translated_nucleotide_match:mass_spec_genome (ID,Target,Note,cds_matches,protein_matches,times_observed)
- Other: The remaining features are listed below.
PCR_product:GenePair_STS (pcr_product) PCR_product:Orfeome (amplified,pcr_product) PCR_product:Promoterome (pcr_product) SAGE_tag:SAGE_tag (count,gene,pseudogene,sequence,transcript) SAGE_tag:SAGE_tag_genomic_unique (count,gene,sequence) SAGE_tag:SAGE_tag_most_three_prime (count,gene,pseudogene,sequence,transcript) SAGE_tag:SAGE_tag_unambiguously_mapped (count,gene,pseudogene,sequence,transcript) SNP:Allele (rflp,status,variation) binding_site:PicTar (Note) binding_site:miRanda (Note) chromosome:Reference (ID,Name) clone_insert_end:misc_feature (clone) clone_insert_start:misc_feature (clone) complex_substitution:Allele (variation) deletion:Allele (variation) experimental_result_region:Expr_profile (expr_profile) experimental_result_region:cDNA_for_RNAi (sequence) gene:landmark (locus) insertion:Allele (variation) inverted_repeat:inverted (Note) oligo:misc_feature (placeholder_attribute) operon:operon (operon) polyA_signal_sequence:polyA_signal_sequence (feature) polyA_site:polyA_site (feature) pseudogene:Pseudogene (ID) pseudogene:history (ID) reagent:Oligo_set (oligo_set) region:Genbank (genbank) region:Genomic_canonical (Note,sequence) region:Link (sequence) region:Vancouver_fosmid (sequence) region:binding_site (feature) sequence_variant:Allele (variation) sequence_variant:misc_feature (allele) substitution:Allele (variation) tandem_repeat:tandem (Note) trans_splice_acceptor_site:SL1 (feature) trans_splice_acceptor_site:SL2 (feature) transposable_element:Transposon (transposon) transposable_element:Transposon_CDS (cds) transposable_element_insertion_site:Allele (variation) transposable_element_insertion_site:Mos_insertion_allele (variation)
CHANGES
WS177 - add more attributes, add pseudogenes, introns WS176 - add Parent attribute to CDS:Coding_transcript WS174 - start documentation