Searching WormBase for Information About C. elegans (Wiley)

From WormBaseWiki
Revision as of 22:41, 29 November 2010 by Cgrove (talk | contribs)
Jump to navigationJump to search

INTRODUCTION

WormBase is the major public biological database for the nematode Caenorhabditis elegans (Chen et al., 2005; PMID: 15608221). It is meant to be useful to any biologist who wants to use C. elegans, whatever his or her specialty. WormBase contains information about the genomic sequence of C. elegans, its genes and their products, and its higher-level traits such as gene expression patterns and neuronal connectivity. Also, WormBase contains sequence and gene structures of C. briggsae and C. remanei, two closely related worms. These data are interconnected, so that a search beginning with one object (such as a gene) can be directed to related objects of a different type (such as the DNA sequence of the gene, or the cells in which the gene is active). One can also do searches for complex data sets.

WormBase is constantly being changed and expanded, both by curation of newly available data and by modifications to the user interface. The entire database is updated and rebuilt into a new release every 3 weeks throughout the year, in releases named "WSnnn" (with "nnn" being 147 as of this writing). To give bioinformaticians reproducible data sets for their analyses, each tenth release is made permanently available as a frozen online archive, roughly once every seven months. Releases WS100 through WS150 have been archived so far. All of the information in this chapter is based on the version of WormBase available in September, 2005 (WS147 release; BioPerl/Generic Model Organism Database software).

The protocols described in this chapter include the following: general searches of WormBase with single search terms; studying a gene, sequence, or protein with its individual web page, or with the Genome Browser; searching for proteins by BLAST hits, sequence motifs, or Gene Ontology terms; aligning C. elegans with C. briggsae genomic sequences; detailed, user-customized searches with WormMart or AceDB Query Language; batch downloads of many sequences at once; identifying the genomic regions, genetic contents, or molecular clones spanning a defined chromosomal interval; electronic PCR; finding expression patterns, and the cell types or developmental origins from which these patterns arise; and searching for genome-wide RNAi results yielding particular phenotypes. Some advice is also provided for installing and running WormBase on a local computer.


BASIC PROTOCOL 1: NAVIGATING THE WormBase HOME PAGE

Necessary Resources

Hardware

   A standard computer with a reasonably fast connection to the Internet (cable modem, DSL, or Ethernet recommended)

Software

   Web browser such as Internet Explorer (http://www.microsoft.com) or Firefox (http://www.mozilla.org/firefox)

The home page for the main WormBase site, http://wormbase.org allows general searches, gives links to specialized searches, and has news about improvements to WormBase.


Fig_1_8_01.png


This page is divided into several parts. At the top is a large, simple menu bar giving quick access to six popular search pages, a Web form for submitting comments or new data, an all-purpose Searches page, and a Site Map. This bar appears at the top of all of the WormBase pages and also includes a link back to the home page (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png). On the home page immediately below this menu bar are the serial number of the data release currently in use, and the official WormBase logo.

The next section of the main page has fields and menus that are used for a basic search of the full database (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png). A user can choose searches from any of over 20 different data types from the "Search for" drop-down menu (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png). The check boxes immediately beneath the search fields offer choices to require strict identity for search terms, to give results in XML format, or to search primary research articles in depth with Textpresso (http://www.textpresso.org; Muller et al., 2004).

Below the basic search section are a directory for the entire WormBase site (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png), and news about C. elegans bioinformatics. Near the bottom of the page is a Links section (with connections to several other sites about C. elegans which complement WormBase in some way), and a list of other sites running WormBase itself (the development server, a data-mining server, official mirror sites, and archival data freeze sites for http://wormbase.org). Finally, the bottom of the page gives a link for users to make comments or ask questions to the WormBase staff, and other links to WormBase policies on copyright and privacy; these links are given on every WormBase page.

WormBase has a development site (http://dev.wormbase.org) which uses the very latest data release and site software, while the main site lags by one release; while closely similar, the main site focuses on stability and the development site on novelty.

The examples of searches given in the following protocols are intended to be illustrative, but not exhaustive; many other searches are possible.


BASIC PROTOCOL 2: PERFORMING A DATABASE SEARCH

This protocol presents how to conduct a general search of WormBase

Necessary Resources

Hardware

   A standard computer with a reasonably fast connection to the Internet (cable modem, DSL, or Ethernet recommended)

Software

   Web browser such as Internet Explorer (http://www.microsoft.com) or Firefox (http://www.mozilla.org/firefox)

1. Go to the main page, either by entering its URL http://wormbase.org, or by clicking on the Home link at the top left of any WormBase Web page (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png).

2. Type a word or phrase into the search form (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png). For a search of the entire database, click the Search link after selecting Anything in the "Search for" pull-down menu.

3. Once the search has run, examine the results (if any). If a single data record is found, the search routine will sometimes go directly to the Web page for that record (it will automatically do this for Gene pages). However, it will usually instead have 0 or ‚â•2 results. In the latter case, the search will give one or more summary pages on which the data records found are listed.

Fig_1_8_02.png

4. Look at the page of summarized results, and consider whether there are too few or too many. If there are too few results, try different key words for the search; computers are literal-minded, and a particular search word may be almost but not quite recognized by the database. Conversely, if there are too many results, select a restricted subset of the database for the search from the "Search for" pull-down menu (e.g., "Any Gene"; (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png)), or check the "Exact match" box and resubmit the search by clicking the Search button.

5. If the search results look reasonable, then follow the hypertext links to the data records themselves.

6. To get a list of many data records all falling into a specific class, pick a subset of the database and search with a wildcard. For instance, if the "Any Gene" search is selected, and run with unc-* as the search term, WormBase will return links to 117 Locus records (snt-1 through vab-8, with 114 unc-* hits). When possible, WormBase recognizes and returns synonyms in searches (which is why a search for unc-* genes returns three non-unc-* gene names; these three genes have aliases of unc-107, unc-110, and unc-121).

7. Remember that search words can be anything of interest. While there is no guarantee that the database will give hits for any given search string, specific topics of interest can have useful results.

For instance, consider a search for' '''hyperplasia ''under all categories (i.e., by choosing Anything from the pull-down menu in (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_01.png). Hyperplasia is a topic normally associated with cancer biology or endocrinology (Merke and Bornstein 2005; Simpson et al., 2005), but one might want to see the potential for C. elegans to serve as a model for the control of tissue proliferation. A search with this term reveals nine hits, including six genetic loci that either have hyperplastic phenotypes or are homologs of human disease genes with roles in deregulated proliferation.


BASIC PROTOCOL 3: EXAMINING A GENE IN C. ELEGANS

BASIC PROTOCOL 4: EXAMINING A MOLECULAR SEQUENCE IN C. ELEGANS

BASIC PROTOCOL 5: FINDING PROTEIN FEATURES

BASIC PROTOCOL 6: SEARCHING FOR GENE PRODUCTS WITH PARTICULAR SEQUENCE MOTIFS

BASIC PROTOCOL 7: USING THE GENOME BROWSER

BASIC PROTOCOL 8: VIEWING THE C. BRIGGSAE GENOME AND ITS SYNTENY WITH C. ELEGANS

BASIC PROTOCOL 9: FINDING SEQUENCE SIMILARITIES WITH BLAST

BASIC PROTOCOL 10: MINING GENE DATA WITH WORMMART

BASIC PROTOCOL 11: DOWNLOADING A BATCH OF SEQUENCES

BASIC PROTOCOL 12: EXAMINING THE GENOMIC CONTENT OF A CLASSICAL GENETIC INTERVAL

ALTERNATE PROTOCOL 1: INSTALLING AND RUNNING WormBase LOCALLY

COMMENTARY

KEY REFERENCES

INTERNET RESOURCES

FIGURE(S)