BASIC PROTOCOL 3: EXAMINING A GENE IN C. ELEGANS
A standard computer with a reasonably fast connection to the Internet (cable modem, DSL, or Ethernet recommended)
From the front WormBase page, enter the name of a gene of interest and click Search (leaving the Find menu set to its default, "Any Gene"). A successful search will lead directly to a Gene page for the gene requested.
A Gene page is intended summarize everything of biological importance known about a given gene; it can thus be quite complex, although it is also meant to be concise ((http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_03.png) and
The page can describe any or all of the following information, when available: identity of the gene's product; normal function of the gene's product; orthologs of the gene (if any); the gene's meiotic and physical location within the genome; its phenotype in classical mutations or RNAi screens; its spatial and temporal expression pattern, with microarray data; domains of its encoded protein; Gene Ontology (UNIT 7.2) terms describing its function; mutant alleles and strains carrying them; homologs identified by best BLASTP (UNIT 3.4) scores versus other proteomes; cDNA, STS, and antibody reagents; microarray probes; SAGE oligonucleotides; and references from the primary C. elegans research literature. All of these are given with hypertext links allowing the user to find more information on a given datum of interest.
Most of this information is given as tabulated text. However, for cloned genes, there will also be a schematic diagram of the gene's physical organization. For instance, the diagram on the zyg-1 Gene Page reveals that zyg-1 actually has a functional gene, bli-2, nested within itself (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_04.png). This diagram provides a graphical link to the Genome Browser (see Basic Protocol 7).
C. elegans is currently known to have ~20,100 protein-coding genes, as well as ~900 genes producing non-protein-coding RNA transcripts. A gene in C. elegans can be studied in several different ways. The original method was genetics, in which a gene equated a classical locus at which two or more different alleles had been mapped (Brenner, 1974). Despite all advances in functional genomics, classical genetics remains uniquely powerful in uncovering new aspects of worm biology (Jorgensen and Mango, 2002).
There are, accordingly, 44,606 Gene objects in the WS147 release of WormBase, marking either current or obsolete gene predictions for C. elegans. In the database, they are given uninformative serial numbers such as "WBGene00021622". However, in real use these genes either have names like "xyz-N" (a three-letter lower-case abbreviation, where N represents an Arabic numeral) or others with names like "cosmid.number" (where "cosmid" is the name of a genomic clone, usually a cosmid, sequenced in the C. elegans genome project (C. elegans Sequencing Consortium, 1998), and "number" denotes an otherwise anonymous gene embedded in the clone).
There are fewer classical (three-letter) gene names for C. elegans genes (~6,100) than names for genes identified through genomic sequencing (~21,000). An increasing number of three-letter names are given to genes identified purely through genomic sequencing and analysis. Conversely, a significant number of genes identified through classical mutagenesis have not yet been linked to the genome through cloning. So, the rule of thumb is that, while Gene objects in WormBase will often have both a classical and a sequence name (as zyg-1 does), this is not inevitable; many genes in C. elegans either lack a classical name or have a classical (mutant) name without a known sequence name. At the same time, functional genomics is being applied to all genes in C. elegans through chromosome-wide RNAi screens, microarray analyses, and protein interaction mapping (Ge et al., 2003; Gunsalus et al., 2005). WormBase therefore annotates a good deal of information about "anonymous" genes with only nondescript sequence names.
If a set of Gene records is wanted, do the search for "Any Gene" with a wildcard entry (such as unc-*), or do a basic search (for "Anything"). With a well chosen query, this will produce a summary page having one or more entries in the database with the general format Gene:xyz-N. Examine the entries and click on whichever is most useful.
Each Gene page for a genomic protein-coding sequence (CDS), even if it lacks mutant alleles mapped through classical genetics, has an interpolated genetic map position for that coding sequence, with a link to a list of nearby genes. This map position is formatted and handled in the same way as the genetic map position for a classical genetic locus (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_04.png). In the case of a gene like zyg-1, which already has mutant alleles, the Genetic Position link (http://www.wormbase.org/wiki/index.php/Image:Fig_1_8_04.png) will mainly help design tests for allelism with other mutant loci. However, for a gene identified solely as a CDS, interpolated map information can indicate which classical mutant alleles might reside within the gene.