BASIC PROTOCOL 5: FINDING PROTEIN FEATURES
WormBase has, in addition to Gene and Sequence pages, Protein records that give the known or predicted details of a gene's protein product(s).
Necessary Resources
Hardware
A standard computer with a reasonably fast connection to the Internet (cable modem, DSL, or Ethernet recommended)
Software
Web browser such as Internet Explorer (http://www.microsoft.com) or Firefox (http://www.mozilla.org/firefox)
1. From the front WormBase page, select "Protein, C. elegans" from the "Search for" pull-down menu, then enter the name of a protein of interest in the adjacent text field and click the Search button.
2. A successful search will lead directly to a "Protein report" for the gene requested
This is the Protein page for that polypeptide in WormBase.
3. Alternatively, from a Gene or Sequence page, click on the link given for Corresponding Protein(s).
For example, the Gene page for zyg-1 (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_03.png) and the Sequence page for F59E12.2 (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_05.png) both link to the Protein Report page for WP:CE28571.
WormBase uses a numbering system for C. elegans proteins with the format WP:CExxxxx, where the last five digits are numbered 00001 to 99999. (For a search of WormBase, CExxxxx is sufficient.) A CE number is unique to a particular peptide sequence, and new CE numbers are generated every time that a change or expansion in predicted gene structures implies a new conceptual protein sequence; where there is the possibility of confusion, CE numbers prevent any problems with changes in Sequence nomenclature.
4. Examine the page for protein characteristics of interest.
A Protein page contains several different classes of information (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_06.png). For a given protein, where applicable, WormBase will show the following: motifs from Interpro (Mulder et al., 2005) and PFAM (Bateman et al., 2004; UNIT 2.5); regions of low-complexity sequence (detected with SEG; Wootton, 1994); predicted transmembrane domains (via THMMER; Krogh et al., 2001) or coiled-coil domains (through NCOILS; Lupas, 1997); the amino acid sequence; length, estimated isoelectric point and molecular weight, and residue composition.
In addition to features inherent to a protein sequence, WormBase also gives the protein's precomputed BLASTP search hits against a number of databases. These databases include C. elegans itself (to detect paralogs within the worm genome); Saccharomyces cerevisiae (from SGD; Balakrishnan et al., 2005), Drosophila melanogaster (from Gadfly/FlyBase; Drysdale et al., 2005); human beings (from Ensembl; Hubbard et al., 2005); and a nonredundant subset of proteins from SWISS-PROT and TrEMBL (Bairoch et al., 2005). Similarities detected by BLASTP are diagrammed alongside motifs and other sequence features (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_06.png).