UserGuide:Browse Genome

From WormBaseWiki
Revision as of 18:25, 17 August 2010 by Cgrove (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Browse Genome OVERVIEW

You have a sequence that you want to work on -- a predicted gene, a cDNA, an STS marker -- or a sequence about which you have some descriptive traits. You want to extract it directly from Wormbase and analyse its genomic context in detail. Where to go?

One place to go is the Genome Browser page, which is specifically designed for just this problem. Given some word or phrase that corresponds to a known genomic component, the Genome Browser page will give you that genomic component itself if it has been entered into Wormbase, or make an effort to guess the closest relevant match, if there seems to be partial overlap between your text and an entry in WormBase.

I_seq1.jpg

HOW TO DO A BASIC SEARCH

Type your word or phrase of interest into the "Landmark or Region" window at the top of the search page, then either type a line return ("enter") into the window or click the "Search" button with your mouse's screen arrow.

You will either get a "no match" response, what might be an exact match, or what might be partial matches.

I_seq2.jpg

THEN WHAT?

For "no match" you will need to come up with some other search. One thing to consider is that you are most likely to get effective matches if you use one of the vocabularies standard in Wormbase -- sequence names in the "cosmid.dot.number" format, canonical gene names, sequence families that are named in the Interpro motif system, DNA coordinates that are realistic...

For possible matches to sequences that "may be related", you will either want to check all the results (which can be slow if there are a large number of possible sequences) or redefine your search term in some way that makes it more restrictive.

For a match you consider valid (or at least interesting), you can do several things.

MOVE ALONG THE CHROMOSOME GRAPHICALLY

For a given sequence there will be a graphical overview provided in two layers.

The topmost layer shows the location of the sequence in the overall chromosome, with some well-spaced genetic markers interspersed along the chromosome, and with the whole chromosome length given in megabases of DNA sequence. This view can be browsed graphically by clicking directly on the map, or by clicking buttons at the upper right corner of the map. The buttons allow outward zooming, inward zooming, small left or right steps along the map, and large left or right leaps along the map. [[Image:I_seq3.jpg

I_seq3.jpg

The bottom layer of the graphic display shows details of the DNA sequence comprising a feature or a feature's immediate vicinity. These features will grow or shrink as the map is zoomed outward or inward, and can be altered to show many different details of the genomic sequence.

I_seq4.jpg

EXAMINE DETAILS OF THE GENOMIC SEQUENCE

The basic view of a sequence given from a successful search hit includes the following features, by default:

1. Which genes have been assigned classical locus names ("Named Genes"; e.g., "unc-1").

2. That part of a known or predicted transcript predicted structure of predicted exon segments and intervening introns ("Gene Models"). This is the basis of all the other features. Exons are colored to distinguish the predicted protein-coding segments of exons (in magenta) and 5' or 3' untranslated regions of protein-coding transcripts (in dark grey).

3. Non-protein coding genes ("miRNAs")

4. Sets of genes belonging to a single, multicistronic transcription unit ("Operons").

5. Which standard C. elegans clones, used by the C. elegans genome project, overlap the sequence ("YACs & Cosmids").

There are many other possible features that, by default, are not shown (in order to keep the initial view of a sequence simple). They can be brought to the screen by checking off the appropriate option checkbox in "Show features", and then clicking on the "Update Image" button. These features are described at:

http://www.wormbase.org/db/seq/gbrowse?help=citations

I_seq5.jpg

One important feature is the "DNA/GC Content" button. At most resolutions this shows the %GC content of DNA, but at the highest resolution of window (100 bp) it actually shows exact nucleotides (for both sense and antisense strand) of a given region. This is useful for designing experiments that require the exact sequence of a site (clone construction, PCR, etc.)

Another important feature is the "Briggsae alignments" button. This gives a map of the degree of similarity of genomic sequences in C. elegans to aligned sequences in C. briggsae, with dark blue indicating the strongest matches and pale blue the weakest.

Genome_browser_fig1.png

Moreover, clicking on this graphical display sends the browser to a view of C. elegans - C. briggsae homology in the Synteny Brower, in which you can actually view two aligned genomes at once.

Genome_browser_fig2.png

At some point you may decide that you want to work on the sequence as a text file rather than a graphical Web page. This can be done by "dumping" a textual version of the sequence.

ADDING ANNOTATIONS TO THE GENOME

The "Upload your own annotations" and "Add remote annotations" features have two purposes: to allow an individual researcher to, in effect, use Wormbase as an electronic notebook for recording private comments on a genomic feature of interest; and to allow expert researches to contribute specific annotations in their areas of knowledge.

For more details on how to do this, see:

http://www.wormbase.org/db/seq/gbrowse?help=annotation

http://www.wormbase.org/db/seq/gbrowse?help=annotation#remote