BASIC PROTOCOL 10: MINING GENE DATA WITH WORMMART

From WormBaseWiki
Jump to: navigation, search

Web pages for a single Gene, Sequence, or Protein only allow relatively small amounts of data to be analyzed at a time; yet the rise of functional genomics has made it useful and necessary for biologists to handle large, complex information about tens to thousands of genes in their everyday work. To support this, WormBase provides tools for wrangling large data sets. The most recently designed and generally usable of these is WormMart, based on the BioMart search engine used by Ensembl (Kasprzyk et al., 2004). WormMart is a Web interface that allows users to design and run complex database searches without having to use (or even know) complex database query languages.

Necessary Resources

Hardware

   A standard computer with a reasonably fast connection to the Internet (cable modem, DSL, or Ethernet recommended)

Software

   Web browser such as Internet Explorer (http://www.microsoft.com) or Mozilla (http://www.mozilla.org/firefox)

1. Go to the WormMart page (by clicking its link on the top left center of the front page; (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png). Once on the first WormMart page, choose a version of WormBase (either the most recent release, or a permanently archived release such as WS140). Then choose a Dataset to search, such as genes, expression patterns, research papers, phenotypes, or "variations" (i.e., mutations or RNAi). Click the "next" button, and go to the Filter page.

2. Decide on how to filter the data set chosen.

Fig_1_8_16.png

For instance, the Genes data set can be filtered by species, gene names or classes, the reliability of the gene's structual prediction, whether the gene is protein-coding or not, genomic location and DNA strand orientation, inclusion or exclusion of 5' and 3' flanking sequences, and RNAi phenotype. Other data sets can be filtered in analogous ways. Moreoever, WormMart allows more than one data set to be filtered in a single search. Although the Filter page gives many choices, it is easy to undo one's choices and explore a different query, by clicking the "back" button. After the data filters are satisfactory, click the "next" button and go to the Output page.

3. On the Output page, decide how to best select and present the filtered data. First, choose an "Attribute". For genes, these attributes can be "Features", "Structures", or "Sequences". Features include short identifiers, genomic locations, functional annotations, available reagents, mutant alleles, or references; Structures include exons, introns, and 5' or 3' untranslated regions; and Sequences are pure nucleotide sequences selected from structural elements or from DNA-based reagents such as cDNAs.

Fig_1_8_17.png

After selecting the details for such an output, choose an output format such as HTML, plain text, Excel table, or compressed file, and click "export".

4. Examine the results.

Fig_1_8_18.png

If they seem promising but not as well-chosen as necessary, go back one or more steps and try different options, and see how the results come out with somewhat different filter and output choices. The best way to get an intuitive sense for this interface is probably to tinker with it at first. With practice, however, it can be a very powerful and flexible tool.