From WormBaseWiki
Jump to: navigation, search

( The front Web page of WormBase, showing a general database search forzyg-1 and the Web Site Directory. This page gives several different entry points for WormBase’s diverse data. An example is shown of the simplest and broadest search (for “Anything”) with a single keyword. A menu of the most-used database searches lines the top of the page, while a list of more specialized data fills the Web Site Directory on the page’s left side.

( Results of the database search in ( Having searched the entire database for anything matching zyg-1, one sees a plethora of disparate results: genes with zyg-1* names, protein-coding sequences (CDSes), expression patterns, and archived research papers. The advantage of this sort of search is that it lowers the chance that one will miss a wanted item, but it necessarily requires that one then pick and choose among this sort of data slurry. Alternatively, one could pick a specific data class in the search menu (such as “Any Gene” or “Cell”; ( and get narrower, but better-focused results.

( The top of the Gene page for zyg-1. WormBase organizes its data around a few key hubs. Gene pages are perhaps the most important single such hub; they are intended to give a compact but full summary of everything known about a given gene in C. elegans. Even in this excerpt, one can get summarized gene function and orthology, a list of transcripts and their experimental evidence, and links to DNA and protein sequences, a C. briggsae ortholog, and external database records.

( Genetic and genomic information from the Gene page for zyg-1. Further down the same page as in ( is a small but detailed diagram of the gene’s DNA structure, with links to transcripts, sequenced clones, and alleles. Along with this are given exact nucleotide coordinates and the meiotic gene map position.

( The Sequence page of F59E12.2 (linked to zyg-1’s Gene page). Most data for the exact nucleotide sequence is too detailed to be of immediate interest on a Gene page, so it is given its own Sequence page instead (linked to the Gene and Protein pages). These data are most useful in designing cloning experiments or direct perturbations of DNA function such as RNAi. Further down this page are another schematic diagram, a BLAST search launcher, exact coordinates of exons and introns in the genomic sequences, and a list of available cDNA clones.

( Part of the CE28571 (zyg-1) Protein page, with a schematic diagram of CE28571's exons, protein motifs, low-complexity domains (detected by SEG), and similarities to proteins in other eukaryotic species. As with nucleotide sequences, proteins have enough detailed information to require their own specialized pages. WormBase’s Protein pages give both text and diagrams to let a user map individual sequence features with respect to one another and to the protein’s exonic coding sequences. The sequence features shown range from very generic (signal, low-complexity, and predicted transmembrane) to broadly distributed but specific motifs (e.g., “tyrosine protein kinase”) and then to individual BLAST matches with highly similar proteins in other organisms. Diagramming all of these allows the user to quickly see what parts of the protein are likely to have distinct functions.

( Protein motifs identified by a "ribonucleoprotein" search term. WormBase has a extensive catalog of protein motifs, taken from both the PFAM and the InterPro compilations. Keyword searches of these motifs are one way to subdivide a general protein type into several types with detailed functional differences.

( Proteins identified as sharing a single motif. Motifs are evolutionarily mobile; they can be spread among homologous proteins or transferred horizontally between nonhomologous ones. Accordingly, each motif in WormBase is listed with the full set of proteins encoding it. This gives one way of identifying every gene product in C. elegans likely to participate in a shared biochemical function.

( A view of the entire X chromosome in the Genome Browser. Like the Gene Page, the Genome Browser provides a central hub around which complex data can be economically organized. Here we see its view expanded to an entire chromosome. The view is customizable with many different user-selected tracks (a few of which are visible at the figure’s bottom).

( An expanded Genome Brower view of the F59E12.2 (zyg-1) sequence, with added tracks for ESTs, mRNAs, and C. briggsae homologies. Where the Gene Page gives a text-oriented, human-readable summary of zyg-1, the Genome Browser here gives a view rooted in its DNA structure. Picking just a few tracks allows this view to link gene coexpression (through operons), likely regulatory sequences (i.e., non-coding DNA highly conserved in C. briggsae), direct evidence for gene activity (ESTs and a cDNA), a genomic clone (archived in Genbank), and complexities of the gene’s structure (including a nested gene with an entirely dissimilar mutant phenotype).

( A view of 1 Mb of genomic DNA, centered on the F59E12.2 (zyg-1) sequence. Genome Browser views are customizable not only in their contents but in their size. Here we see a tracked view spanning one megabase of genomic DNA. As the view grows, fine details are merged into an general map; this works best when one is looking for features that vary over a scale of tens or hundreds of thousands of nucleotides.

( A view of 100 base pairs of genomic DNA immediately to the 5' side of F59E12.2 (zyg-1). The opposite extreme of size selection is this 100-nt view of zyg-1’s 5’-flank. This view lists individual nucleotides, and is ideal for fine resolution of transgenic construct or cis-regulatory sites. As in larger views, multiple tracks can be chosen to make easy comparisons of diverse features (such as cDNAs versus predicted start sites).

( The Genome Browser showing the C. briggsae ortholog of zyg-1.'' C. briggsae’s genome is also available through the Genome Browser. This view of zyg-1 confirms that its complex structure is indeed conserved in C. briggsae, while also showing small differences in intron size.

( The Synteny Viewer showing the zyg-1/bli-2 cluster in C. elegans and C. briggsae. Here the zyg-1 loci from two Caenorhabditis species are shown in syntenic alignment, making their precise similarities and differences obvious. Like the Genome Browser, this view can be expanded to take in large chromosomal spans or contracted to single DNA sites. A particularly good use of this viewer is in working out the clearest possible view of an evolutionarily complex syntenic region.

( A BLASTP search of WormPep release 147 with the human dymeclin (DYM) protein, which when mutated leads to Dyggve-Melchior-Clausen or Smith-McCort dysplasia. BLAST searches in WormBase not only give hit results, but also give hyperlinks to their database records, making it easy to go from a positive search result to its Gene Page or to a view of its genomic region. Both strong and weak hits can be informative, since they can identify both orthologs and paralogs of a query sequence. Searches have a default cut-off E-value of 0.01, but this can be adjusted by the user for more or less stringency (and hits).

( The Filter menu of WormMart, with filters set to select for pqn-* genes in C. elegans with uncoordinated RNAi phenotypes. WormMart gives the user a menu with which one or more of a great many different conditions can be imposed on data. Each condition is itself simple, but the freedom of users to choose and mix them with a graphical interface makes highly complex searches practical. This particular search started by choosing the WS140 data release (shown in the Summary on the right-hand side) and its Gene data set. This still leaves the user with over 40,000 objects to sort through. In this simple search, the user has selected only those genes falling into the pqn class, which includes ~100 genes encoding prion-like proteins with domains highly enriched for glutamine (Q) or asparagine (N).

( The Output menu for Sequence attributes, showing several different choices of gene substructure. After filtering, data in WormMart need to be exported somehow. Again, many different choices of output contents and format exist. One particularly useful form of this are sequence outputs in which the user picks some type of gene structure (such as 5’ flanks, introns, or exons) for mass export from a selected gene set (selected by choices like those shown in ( As a given option for sequence export is picked, a small schematic diagram of the gene is marked in red to clarify what the option means in practice. Since the sequences are exported in FASTA format, the headers for these FASTA records can themselves be loaded with user-selected data (such as gene names).

( Final results of the search in ( Another option for user-selected output is to have tables listing gene features rather than nucleotide sequences. This output was generated from the pqn-* search shown in ( by selecting (in addition to the pqn gene class) for molecular and classical gene names, RNAi phenotypes, and conserved orthologous protein groups (KOGs). As with the Genome Browser, a strength of these user-selected outputs is the ability to quickly compare disparate data sets in an easily scanned, well-aligned format.

( The graphical output from a search for genetic markers in the vicinity of hid-3. Classical genetics in C. elegans remains crucial for finding new biological functions. Here the user has a gene map for the region around the uncloned hid-3 gene which integrates cloned genes, uncloned loci, predicted genes, and STS markers. Such a view makes it straightforward to design fine-scale STS mapping and to identify other loci that might be allelic to hid-3.

( Part of the tabular output from a search for hid-3 markers. Graphic and tabular gene maps have complementary uses. The graphical map in ( lets the user take in a genetic region intuitively at a glance; this table lists the exact identity and details of its contents. Details include the meiotic map position, alleles, and laboratory strains for each gene in a region.

( Results of a Gene Ontology (GO) search for "RNA splicing". GO allows genes to be classified by their shared biochemical or biological roles whether or not their products have any similarity to each other. While this classification is powerful, it can be difficult to decipher because there are a great many GO terms, most with complex meanings. To help make sense of this complexity, searches in WormBase for GO terms give tables listing not only the names of terms, but also their definitions and the genes associated with them. Searching with a simple phrase such as “RNA splicing” can give many different results with highly detailed meanings.

( Summary of the "RNA splicing" GO term in WormBase, with its connections to genes and protein motifs. Each GO term has its own summary page, accessible either through a term search (as in ( or through gene or protein motif pages. The broadly-defined “RNA splicing” term is seen here to encompass two different genes and three different protein motifs. One link on this page leads to a browsable version of the entire Gene Ontology system.

( A detailed view of the "RNA splicing, via transesterification reactions with bulged adenosine as nucleophile" GO term in WormBase: defined, shown in its context of other GO terms, and connected to genes and protein motifs. Another view of a GO term, this time with a browsable context. As in (, links are given for associated genes and protein motifs, but here one can also see how this rather specialized term fits into the overall Gene Ontology. Note that this term is not actually the most narrow one possible, but is itself a parent term for three even more specialized terms.

( An expanded view of neuronal lineages. WormBase gives a graphically browsable diagram of C. elegans’ entire developmental lineage, from the fertilized egg to the adult body. Here is shown a small subset of that lineage, starting from the progenitor cell P1. Each node can be either collapsed or expanded by clicking on it, to give simplified or elaborated views; here all the nodes have been expanded. Each cell type is given a hypertext link to its own Cell page.

( The Cell page for AS1, a neuron in the P1 lineage seen in ( Clicking on the AS1 link in P1’s lineage leads to this report, summarizing developmental and functional traits for this cell type. A single cell can belong to more than one group, defined either by cell class or by organ or tissue. Cells can be major progenitors of a lineage branch (blast cells), intermediates during development, or terminally differentiated. They can also have many different gene expression patterns associated with them, either generically (e.g., a gene expressed in “neurons” will implicitly be expressed in AS1) or more or less specifically (some genes may be expressed in only AS1, while others may be expressed in some well-defined set of cells including AS1). Although WormBase has tended so far to emphasize a gene-centric view of the organism, Cell pages are likely to become increasingly detailed hubs of information rivalling the Gene pages and Genome Browser as WormBase’s contents extend to integrative, physiological data.

( Diagram of the ADAL neuron. Another Cell page, for a sensory neuron of the head. Pages for sensory and some other neurons include small diagrams of their structure in the body, with the pharynx given as a background for orientation. More specialized and fully detailed anatomical views are available in WormAtlas Each neuron page also gives a detailed list of neuron-by-neuron connections, determined from electron-microscopic serial sectioning of an entire worm’s nervous system.

( Summary for a chosen set of neurons. Neurons in C. elegans have somewhat cryptic names (e.g., “AFD” for “amphid finger neuron”). The tabular output from a Neuron search decodes these names by listing their human-readable identities, their membership in neuronal groups (by shared ganglia or shared traits), and their developmental lineage abbreviations (totally different from their differentiated neuron abbreviations). Numbers of their synaptic connections and gap junctions are also given, whose identities are detailed on a neuron’s Cell page ((

( Results for an Expression Pattern search with "AS neurons". Gene expression patterns can be searched with terms for pre-defined cell groups, extracted from the primary literature. The resulting table gives, for each pattern found, the gene driving it and a summary of the cells it includes. Since these expression patterns are usually driven by a gene’s entire promoter, and since metazoan promoters can have complex, multiple cis-regulatory elements, the patterns can be heterogeneous and extensive. However, some genes are solely expressed to a single cell type in the whole animal, while yet others appear to have truly ubiquitous expression in all somatic cells. Hypertext links from these search results can lead to a gene, its DNA sequence, or a detailed report about a single expression pattern (e.g., “Expr217”). The expression pattern report, in turn, will list the exact reagents used to determine the pattern (such as a defined DNA region, or an antibody).

( A diagram of overlaps between zyg-1’s canonical (sequenced) genomic cosmid clone and other clones that are not sequenced but may have more useful termini for experimental use. There are many more clones produced by the C. elegans genome project than have actually been sequenced. The default view in the Genome Browser gives only those cosmid or YAC clones that were actually used in genomic sequencing; however, for actual experiments on individual genes, a “non-canonical” cosmid’s insert may better encompass the gene’s full 5’ and 3’ flanks. The clone map search allows users to see the entire set of clones available for a gene region.