UserGuide:Sequence Page

From WormBaseWiki
Jump to: navigation, search

A Sequence Report page provides those data for a given gene that are most pertinent to its genomic sequence rather than to its more general, classical biology. A schematic diagram is given for the gene's structural organization within its genomic environment; a "Click Here to Browse" link leads to a dynamic view of that environment within the Genome Browser. The schematic diagram will show classical gene names where they exist, structural models for the genes, individual frames of the coding exons in the genome, and STSes, SNPs, and transposons that can be used for genomic mapping or site-selected mutagenesis.

I_seqreport1.jpg

Other genomic data will include genetic identity to a classical locus (where it exists), precise numerical coordinates for the gene's transcriptional unit, identities of cDNAs matching it, links to the Protein page for its gene product (where applicable) and to its C. briggsae ortholog (where one is known to exist). Other data such as the source clone, sequencing center responsible for the data, microarray data, remarks on the sequence annotation, etc. are given.

Further down the page will be sequence data in FASTA format: unspliced genomic sequence data (with exons noted in upper-case, versus introns in lower-case); the spliced coding sequence in nucleotides; and the predicted protein sequence of the gene's product. FASTA sequence formats are perhaps the most common and simple representation of sequence data, and can be handled by a great many different analytical programs.

I_seqreport2.jpg

To aid analysis of these data, a sequence page provides a preformatted Blast search with that sequence (in DNA or protein form) against the C. elegans genome. To set up the search, all you need to do is click on the "Blast search" link.

I_seqreport3.jpg

Moreover, each page gives a genetic map position value for its sequence, with a link to a list of nearby genes. For classically studied genes, this map function is mainly useful for designing tests of allelism. However, many genes in C. elegans are identified solely as protein-coding sequences (CDSes) in the genome; for genes of this type, the Mapping Information feature allows the CDS to be interpolated directly onto the classical genetic map, which, in turn, can suggest testable hypotheses of which classical mutations might reside within a given CDS.

I_seqreport4.jpg

Three useful introductory texts on practical sequence analysis of FASTA files (and other data) are:

Jambeck, P. and Gibas, C. (2001). Developing Bioinformatics Computer Skills. O'Reilly, Sebastopol, CA. ISBN: 1-56592-664-1.

Mount, D.W. (2001). Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. ISBN: 0-87969-608-7.

Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 2cd ed. A.D. Baxevanis and B.F.F. Ouellette, eds. Wiley-Interscience, New York. ISBN: 0-471-38391-0.