From WormBaseWiki
Revision as of 21:20, 23 September 2007 by Tharris (talk | contribs)
Jump to navigationJump to search

WormBase has, in addition to Gene and Sequence pages, Protein records that give the known or predicted details of a gene's protein product(s).

Necessary Resources


   A standard computer with a reasonably fast connection to the Internet (cable modem, DSL, or Ethernet recommended)


   Web browser such as Internet Explorer ( or Firefox (

1. From the front WormBase page, select "Protein, C. elegans" from the "Search for" pull-down menu, then enter the name of a protein of interest in the adjacent text field and click the Search button.

2. A successful search will lead directly to a "Protein report" for the gene requested


This is the Protein page for that polypeptide in WormBase.

3. Alternatively, from a Gene or Sequence page, click on the link given for Corresponding Protein(s).

For example, the Gene page for zyg-1 ( and the Sequence page for F59E12.2 ( both link to the Protein Report page for WP:CE28571.

WormBase uses a numbering system for C. elegans proteins with the format WP:CExxxxx, where the last five digits are numbered 00001 to 99999. (For a search of WormBase, CExxxxx is sufficient.) A CE number is unique to a particular peptide sequence, and new CE numbers are generated every time that a change or expansion in predicted gene structures implies a new conceptual protein sequence; where there is the possibility of confusion, CE numbers prevent any problems with changes in Sequence nomenclature.

4. Examine the page for protein characteristics of interest.

A Protein page contains several different classes of information ( For a given protein, where applicable, WormBase will show the following: motifs from Interpro (Mulder et al., 2005) and PFAM (Bateman et al., 2004; UNIT 2.5); regions of low-complexity sequence (detected with SEG; Wootton, 1994); predicted transmembrane domains (via THMMER; Krogh et al., 2001) or coiled-coil domains (through NCOILS; Lupas, 1997); the amino acid sequence; length, estimated isoelectric point and molecular weight, and residue composition.

In addition to features inherent to a protein sequence, WormBase also gives the protein's precomputed BLASTP search hits against a number of databases. These databases include C. elegans itself (to detect paralogs within the worm genome); Saccharomyces cerevisiae (from SGD; Balakrishnan et al., 2005), Drosophila melanogaster (from Gadfly/FlyBase; Drysdale et al., 2005); human beings (from Ensembl; Hubbard et al., 2005); and a nonredundant subset of proteins from SWISS-PROT and TrEMBL (Bairoch et al., 2005). Similarities detected by BLASTP are diagrammed alongside motifs and other sequence features (