UserGuide:Genome Dumper

From WormBaseWiki
Jump to: navigation, search

Genome Dumper

WormBase provides large sets of genomic sequence data through its Genome Dumper. This can be useful for getting data sets like the complete set of 5' flanking regions for all genes.

To run this service, do one of the following: select a pre-defined search; type in a list of sequence or chromosome names; or upload a text file with sequence or chromosome names. Note that uploaded files will need to be in plain-text to be read; Word documents are likely to fail (because they will be uploaded as the full document file with Microsoft-specific code rather than as the embedded text alone).

In the "Features" menu, pick a given feature that you would like to extract from WormBase. The current default is "Gene Models", which will base a search on the structures of known or predicted genes encoding either protein products (i.e., most of them) as well as non-protein-coding RNA products (e.g., miRNAs).

In the "Options" menu, use your discretion about what alternatives to select. For most normal purposes you will probably want the "FastA" default, which gives outputs in the FASTA sequence format. Unless you want to have both a flanking sequence and the gene sequence linked to it, you should check the option "show only flanking sequences". Plain (ASCII) text is often preferable to HTML, so checking *off* the "As HTML" box (on by default) may be advisable.

Then click the "DUMP" link. The resulting download will be into a window of whichever Web browser you are using. In many cases, the download will be large (for instance, in a search for flanking sequences of "all genes"). You should expect that large downloads will take some time (up to an hour); it is important not to interrupt the download by disturbing the browser's function or prematurely using a "Stop" key in the browser window.

Save the final download to a text file on your computer, and check the format of the resulting text. If you chose the "Verbose" option, you may need to reformat the headers of the sequences to get true FASTA formatting. This can be probably be done most easily with a text editor or word processor that supports text replacement across an entire file.

One important feature of the Genome Dumper is the option of recursive searches. To use this, check the "Paste results back into Sequences to Search" box in the "Options" menu before carrying out a genome dump.

I_gdumper1.jpg