WormBase nomenclature - what every user should know
Species
In WormBase, species are referred to by their Linnean binomial name. (e.g. Caenorhabditis elegans or C. elegans)
The following species have their gene annotation manually curated.
- C. elegans
- C. brenneri
- C. briggsae
- C. japonica
- C. remanei
- Brugia malayi
All other species in WormBase have their gene annotation imported from the authors or are predicted.
Genomes, assemblies, clones and contigs
Reference genomes in WormBase are given version names, for example C. elegans has the version names: WBcel215 (an old version) and WBcel235 (the current version).
A. suum has two assemblies from different groups in WormBase and these have the version names AscSuum_1.0 and ASU_2.0
The genomes of most species in WormBase are incompletely assembled, being left as various sizes of contig. Only C. elegans and C. briggsae have been assembled into chromosomes.
The chromosomes of C. elegans have the names:
- CHROMOSOME_I
- CHROMOSOME_II
- CHROMOSOME_III
- CHROMOSOME_IV
- CHROMOSOME_V
- CHROMOSOME_X
- CHROMOSOME_MtDNA
These may be abbreviated to the chromosome letter (I, II, III, IV, V, X, MtDNA).
The C. elegans chromosomes are composed of tiling paths of clones from the original sequenceing project.
These clones have names like 'B0001', 'R12C12', 'VF38E11R', 'Y48E1B', 'ZK6', as given by the various groups that produced the clones.
The chromosomes of C. briggsae have the names:
- chrI
- chrI_random
- chrII
- chrIII
- chrIII_random
- chrIV
- chrIV_random
- chrV
- chrV_random
- chrX
- chrX_random
- chrun
The C. briggsae '*_random' chromosomes are those where the clones are known to belong to a chromosome, but their order and position in the chromosome is unknown.
The 'chrun' chromosome is a set of clones whose chromosome is not known.
CDS and Transcript
A CDS (coding sequence structure) is the part of a gene locus that codes for a protein product.
In C. elegans these are guaranteed to start with an AUG (or other legal initiation codons) and end with a STOP codon with no internal STOP codons (apart from scbp-2 which has a non-standard initiation codon).
This may not always be the case in other curated WormBase species' CDSs.
CDSs are the only part of a gene locus that is manually curated. The protein product, the Transcript and the Gene span are all constructed automatically from the CDS structure and available transcript evidence.
A CDS is named in C. elegans after the clone it is created on followed by a dot and the next available number for naming objects on the clone. For example 'AC3.3' is the third CDS made on the clone 'AC3'.
EST and mRNA evidence is then used to extend the CDS structure to model the 5'UTR and the 3'UTR of the expected mature mRNA Transcript of the CDS.
If there is evidence for one or more isoforms of a CDS at a locus, then they are distinguished by giving them letters after their name. For example, if there is evidence for a different structure of CDS at the locus of the 'AC3.3' CDS, then the existing CDS will have its Sequence Name changed to 'AC3.3a' and the new one will have the Sequence Name 'AC3.3b'. There is nothing special about CDS names anding in 'a', they are not necessarily longer, better annotated, more important etc. that the other isoforms at that locus. They are simply the first structure that was created. Transcripts of the CDS isoforms will have the same Sequence Names as the CDSs ('AC3.3a' and 'AC3.3b', for example).
If there is evidence for alternative splicing in the 5'UTR or the 3'UTR of a Transcript object, then Transcript isoforms will automatically be created. These Transcript isoforms are distringuished by adding a dot and numbers after the Sequence Name of the CDS. For example, if 'AC3.3a' has alternate splicing in a UTR giving rise to two Transcript isoforms, the CDS structure will be unaffected and it will retain its name, but the two Transcript isoforms will be created and will have the names 'AC3.3a.1' and 'AC3.3a.2'.
Genes and gene classes
When a new locus is implicitly created by creating a new CDS or non-coding RNA transcript structure at a location on the genome, that locus has a Gene created which is assigned a WormBase Gene ID like 'WBGene00000024'. The Gene ID uniquely refers to this locus (with all of its CDS and Transcript structures) in the WormBase database. It can be used in publications to identify this gene, but it is not a very human-friendly way of referring to a Gene and is prone to copying mistakes. As an alternative to the Gene ID, users may refer to a Gene by the Sequence Name. The Sequence Name is the name that was given to the CDS or ncRNA Transcript at that locus, but without the letters at the end that distinguish isoforms. For example 'AC3.3'. Use of either the Gene ID ('WBGene00000024') or the Sequence Name ('AC3.3') is acceptable in publications.
An alternative to using the Gene ID or Sequence Name is to use a Gene Name. These are composed of the Class of the gene followed by a hyphen and a number, for example 'abu-1'. About 9000 coding genes have currently been assigned a Gene Name and there are currently about 2500 Gene Classes.
Gene Name nomenclature is controlled by WormBase.
See below for instructions on how to propose a new Gene Name.
When a C. elegans gene which has a Gene Name (for example 'tra-1') has a homolog in another species, the homolog has a Gene Name constructed from the C. elegans Gene Name with a three-letter species prefix, like 'Cbr-tra-1' in C. briggsae
Species
|
Prefix
|
Brugia malayi
|
Bma-
|
C. brenneri
|
Cbn-
|
C. briggsae
|
Cbr-
|
C. japonica
|
Cjp-
|
C. remanei
|
Cre-
|
Onchocerca volvulus
|
Ovo-
|
(Also see Other Nematodes section).
Proteins
Protein sequences are automatically translated from the CDS. Each
sequence is automatically assigned a Protein ID, like 'WP:CE05133'.
Each protein sequence has one unique Protein ID. This means that when
two identical CDSs from different genes are translated to give the
same protein sequence, their proteins products will have the same
Protein ID. It also means that when any CDS is changed in the course of
manual curation to have a different structure, its protein product
will have a new Protein ID.
The consequences of this are that any protein sequence can be
unambiguously referred to, but the ID of the protein product of a CDS
may change between WormBase releases.
A second way of referring to a protein is to use its Protein
Name. This is composed of the Gene Name of the Gene of the CDS with
the letters in uppercase, This is a way of referring to whatever the
protein product is without specifying a particular sequence from the
CDS. For example the CDS 'AC3.3' whose Gene has a Gene Name of
'abu-1', currently produces the protein with a Protein ID of
'WP:CE05133' and its protein product will always be referred to as the
Protein Name 'ABU-1'.
When a gene has CDS isoforms, each isoform will produce a different
protein sequence. For example the Gene 'tra-1' has two isoforms
('Y47D3A.6a' and 'Y47D3A.6b'). The protein products of these CDS
isoforms have the Protein Names 'TRA-1, isoform a' and 'TRA-1, isoform
b'.
How to apply for new names
Genetic Nomenclature for Caenorhabditis elegans
Genetic nomenclature for Caenorhabditis elegans
is supervised by <a href="http://wormbase.org/">WormBase</a> in collaboration
with the <a href="http://www.cbs.umn.edu/CGC/">Caenorhabditis Genetics Center (CGC)</a>.
How to Register a New Gene Class or Gene Name
Investigators wishing to register new gene names for
C. elegans should note the summary guidelines
below and apply online via <a href="http://minerva.caltech.edu/~azurebrd/cgi-bin/forms/gene_name.cgi">WormBase</a>
or by email application to genenames@wormbase.org
How to Register a New Laboratory and Receive Lab, Strain and Allele designations
Specific identifying codes (CGC designations) are assigned to each
laboratory engaged in dedicated long-term genetic research on
C. elegans. Each such laboratory is
assigned a lab/strain code, for naming strains, and an allele code,
for naming mutations and transgenes. These codes are listed at the
<a href="http://www.cbs.umn.edu/cgc/gene-names">CGC</a>.
Investigators requiring new CGC designations should apply to <A HREF="mailto:tim.schedl@wormbase.org">Tim Schedl</A>
Summary Guidelines for Proposing New Gene Names
Standard Genetic Nomenclature Recommendations
This summary is based on the original proposals for
C. elegans nomenclature (Horvitz et al.,
1979 Mol. Gen. Genet. 175: 129-133), plus additional
recommendations that have been distributed in
The Worm Breeder's Gazette or posted on <a href="http://www.wormbase.org">WormBase</a>.
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 1 year ago
Genetic Loci
-
A Gene is a region that is expressed or a region that has been expressed and is now a Pseudogene.
-
A gene can be a Pseudogene, or can express one or more non-coding RNA genes (ncRNA) or protein-coding sequences (CDS).
-
All WormBase genes have a unique identifier like WBGene00006415.
-
This is guaranteed to consistently follow the gene throughout any changes that may be made to its structure.
-
When gene structures are split into two genes, the original gene ID will usually apply to the 5' gene and a new gene ID will be created for the other half.
-
All C. elegans WormBase genes also have a Sequence Name, which is derived from the cosmid, fosmid or YAC clone on which they reside, for instance F38H4.7, indicating it is on the cosmid F38H4, and there are at least 6 other genes on that cosmid.
Approved gene names
-
If a gene produces a protein that can be classified as a member of a family, the gene may also be assigned a Approved name consisting of three or four italicized letters,
a hyphen, and an italicized Arabic number, e.g., unc-30 indicating that this is the 30th member of the unc gene family.
-
There are a few exceptions to this format, like the genes cln-3.1, cln-3.2, and cln-3.3 which all are equally similar to the human gene CLN3.
-
Gene GCG names for non-elegans species in WormBase have the 3-letter species code prepended, like Cre-acl-5, Cbr-acl-5, Cbn-acl-5.
-
The gene name may on rare occasions be followed by an italicized Roman numeral, to indicate the linkage group on which the gene maps, e.g., dpy-5 I or let-37 X or mlc-3 III.
-
Assignment of gene family names is controlled by WormBase and requests for names should be made, before publication, <A HREF="http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/gene_name.cgi">via the form.</A> or via email to: <A HREF="mailto:genenames@wormbase.org">genenames@wormbase.org</A>
- For genes defined by mutation, the Approved gene names refer to the mutant phenotype originally detected or most easily scored e.g.
- dumpy (dumpy) in the case of dpy-5
- lethal (lethal) in the case of let-37.
- For genes defined on the basis of sequence similarity or sequence features,
the Approved gene name refers to the predicted protein product or RNA product e.g.
- myosin light chain in the case of mlc-3,
- superoxidedismutase in the case of sod-1,
- NPHP (human kidney disease nephronophthisis gene) in the case of nph-4.
- ribosomal RNA in the case of rrn-1.
- Genes with related properties are usually given the same three-letter name
and different numbers. For example, there are three known myosin light chain genes: mlc-1, mlc-2, mlc-3,
and more than twenty different dumpy genes: dpy-1, dpy-2, dpy-3, and so on.
- Genes can be given names corresponding to homologous named genes in other standard genetic organisms. e.g.
- rnt-1 is the C. elegans ortholog of the Drosophila gene runt.
- wrn-1 is the C. elegans ortholog of the human gene WRN1, responsible for Werner's syndrome.
- Gene names that are memorable, informative and simply explained are encouraged.
- Genes in a paralogous set related to a single named gene in another organism are
sometimes given the same gene name and number, followed by a distinguishing decimal.
e.g. four C. elegans genes homologous to SIR2 in S. cerevisiae
have been given the names sir-2.1, sir-2.2, sir-2.3, sir-2.4.
- Gene names based solely on RNAi phenotypes or high-throughput analysis of gene expression or protein interaction are discouraged.
- Gene names including c (for Caenorhabditis), ce (for C. elegans), n (for nematode) or w (for worm) are discouraged. Instead, an optional prefix Cel- can be added to indicate the species origin.
- A limited number of genes have been given temporary tag- names (tag =
temporarily assigned gene name). These are genes for which
deletion alleles have been generated by reverse genetic methods, but which have not
yet been given more informative names based on sequence or mutant phenotype.
When sufficient information becomes available, each tag name will be
replaced by an appropriate standard 3-letter or 4-letter name.
- A limited number of genes, named on the basis of sequence homology,
have been given non-standard names ending with alphanumeric identifiers
rather than with simple numbers, in order to make these names closer to
the generally accepted names used in other organisms. e.g. eif-3.B,
eif-3.C encode proteins of the conserved translation factor eIF3.
Approved Gene Name Conflicts
Approved Gene names that have been established in the published literature
and databases should preferably not be changed. In cases where a
gene has received multiple names, one name will be adopted as the
main name for the gene. Other names will continue to be listed in
databases. Whenever possible, name changes or the adoption of a
single main name should be made with the approval of all laboratories
concerned.
Homologous Genes
If a homolog of a known C. elegans gene
is identified in a related species such as Caenorhabditis briggsae,
this can be given the same gene name, preceded by three italic letters
referring to the species, and a hyphen. For example, Cbr-tra-1
is the name for the C. briggsae homolog of the
C. elegans gene tra-1. The
C. elegans homolog of a gene identified
and named in another organism can be distinguished by the same convention,
using "Cel-" as an optional prefix. For example, Cel-snt-1
defines the C. elegans synaptotagmin gene.
Alleles and Mutations
-
Every mutation has a unique designation. Mutations are given names
consisting of one or two italicized letters followed by an italicized
Arabic number, e.g., e61 or mn138 or st5.
The letter prefix refers to the laboratory of isolation, as registered
with the <a href="http://www.cbs.umn.edu/cgc/lab-head">CGC</a>.
There are currently more than 500 registered laboratories. For example, e
refers (originally) to the MRC Laboratory of Molecular Biology (Cambridge, U.K.),
(currently) to the laboratory of J. Hodgkin (University of Oxford), and st
refers to the laboratory of R.H. Waterston (originally at Washington University,
St. Louis, MO, currently at the University of Washington, Seattle).
-
When gene and mutation names are used together, the mutation name is included
in parentheses after the gene name, e.g., dpy-5(e61), let-37(mn138).
When unambiguous (e.g., if only one mutation is known for a given gene or if
all work on a gene described in a publication used a single mutation cited in
a Methods section), gene names are used in preference to mutation names
(let-37 rather than mn138 or let-37(mn138)).
-
Optional suffixes indicating characteristics of a mutation can follow a
mutation name. These are usually two-letter nonitalicized letters, e.g.,
hc17ts, where ts stands for temperature-sensitive, or pk15te,
where te stands for transposon-excision.
-
Mutations created by in vitro mutagenesis should receive standard
allele names. For cases where a pre-existing genomic mutation is
re-created by in vitro mutagenesis, it is still desirable to give
the new mutation a new name.
-
The wild-type allele of a gene is defined as that present in the Bristol
N2 strain, stored frozen at the CGC and other locations. Wild-type alleles
can be designated by a plus sign immediately after the gene name, dpy-5+,
or, more commonly, by including the plus sign in parentheses, dpy-5(+).
Gene Knockouts
Most gene knockouts constructed to date are small deletions (<5 kb) generated
by transposon excision or by chemical mutagenesis. These are named as alleles,
sometimes with the optional suffix te (transposon-excision) or ko (knockout).
Example: zyx-1(gk190) is a 777 bp deletion in the zyx-1 gene.
Some knockouts have been made by insertion of a selectable marker, such as
unc-119(+). These are named as alleles, with an optional descriptor
defining the selected marker following the unique allele name, and preceded
by a double colon. Example: jf61 = zhp-3(jf61::unc-119+)
Some of the small deletions generated by reverse genetic methods may remove
parts of two adjacent genes. If only two genes appear to be affected, then
the deletion is given a single allele name, but the genotype is written with
both gene names coupled with an ampersand (&). Example: allele ok615
is a 1422 bp deletion of two adjacent genes, so it can be written rad-54&tag-157(ok615).
Deletions that affect more than two genes are named as Deficiencies (Df),
as described in the Chromosomal Aberrations section.
Modifers: Suppressors, Revertants and Enhancers
There is no special nomenclature for modifier mutations. Many extragenic
suppressor loci are called sup (40 sup loci defined so
far, with a wide variety of properties and mechanisms). An increasing
number of more specific modifier gene classes have been established, such
as smu (suppressor of mec and unc), and smg
(suppressor with morphogenetic effect on genitalia)
and sel (suppressor/enhancer of lin-12).
Intragenic suppressors or modifiers are indicated by adding a second
mutation name within parentheses; for example, unc-17(e245e2608)
is an intragenic partial revertant of unc-17(e245).
Mutations known to be chromosomal rearrangements, rather than intragenic lesions,
are named differently, as described in the Chromosomal Aberrations section.Chromosomal Aberrations
Duplications (Dp) deficiencies (Df), inversions (In)
and translocations (T) are known in C. elegans
cytogenetics; these are given italicized names consisting of the laboratory
mutation prefix, the relevant abbreviation, and a number, optionally followed
by the affected linkage groups in parentheses (e.g., eT1(III;V), mnDp5(X;f),
where f indicates a free duplication). Chromosomal balancers of unknown
structure can be designated using the abbreviation C, e.g., mnC1(II).
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 19 days ago
DNA sequences
-
There are no specific recommendations for designating cloned sequences that
are not similar to known genes. Most genomic clones have been provided by
the C. elegans mapping/sequencing consortium
(based at the <a href="http://wiki.wormbase.org/index.php?title=Cosmids/YACs">Wellcome Trust Sanger Institute, Cambridge, UK</a>,
and the <a href="http://genome.wustl.edu/">Genome Sequencing Center, St. Louis, USA)</a>.
Cosmid clones generated by the consortium are named on the vector, either pJB8 (initial letters B, C, D, E, R, M, ZC) or a Lorist vector (initial letters K, T, W, F, ZK). Phage clones (in Lambda 2001) are identified by the initial letters A, ZL, YSL. Some fosmid clones are identified by the initial letter H. Vancouver fosmid clones are identified by initial letters WRM.
-
YACs (yeast artificial chromosome clones) are identified by the initial letter Y, e.g., Y3D5. YAC subsequences may be given names derived from the initial YAC name. Example: subsequences derived from the YAC Y47H9 have been called Y47H9A, Y47H9B, Y47H9C. Note that physical clones corresponding to these subsequences are not available.
-
Genomic DNA clones that have not been generated by the consortium are usually designated by the laboratory strain designation (see Strains section), a #
symbol and an isolation number, e.g., MT#JAL6.
-
Sequences that are predicted to be genes from sequence data alone are
initially named by the consortium on the basis of the sequenced cosmid,
plus a number. For example, the genes predicted for the cosmid T05G3
are called T05G3.1, T05G3.2, etc. (numbered in arbitrary order of definition).
Such names can be superseded by standard 3-letter names when this becomes
appropriate. Thus, R13F6.3 has been given the name srg-12 (for serpentine receptor, class gamma).
-
EST (Expressed Sequence Tag) clones historically received names with prefixes such as cm and yk, but the INSDC accession number is now preferentially used for any new EST data.
Last edited by <a href="/resources/person/WBPerson1983" class="person-link" title="">Paul Davis</a> – 239 days ago
Transposons and Transposon Insertions
-
Types of C. elegans transposons are called Tc1, Tc2, etc., where
each number represents a different family. Transposon names are not italicized
except when included in a genotype. Different races of
C. elegans have different distributions of these transposons, which result
in polymorphic differences from the reference wild-type strain Bristol N2. These
natural differences between races are given polymorphism names, as described below.
-
The endogenous transposons of C. elegans can be
mobilized to generate new insertional mutations. In addition, foreign transposons
such as Mos1 can be introduced by transformation, and then mobilized to create
new insertions. All these newly generated transposon insertions can be named as
simple mutations, with an optional suffix indicating the nature of the transposon.
They are treated as alleles of named genes if they are located within the
boundaries of a gene. Example: r293 is a Tc1 insertion in the gene unc-54.
An optional descriptor can also be added after a double colon to indicate the
nature of the insertion. Example: unc-54(r293::Tc1).
-
Note that such insertions may often be silent in terms of gene activity, for example
if an insertion occurs within an intron and can be spliced out.
-
Newly generated transposon insertions, especially those located in apparently
intergenic regions, may also be given Ti (transposon insertion) names.
These consist of a prefix identifying the laboratory of origin, the two letters
Ti, and a number, all italicized. Example: eTi13 is an
insertion of a Mos transposon into an intergenic region on LGIII.
-
Transposon loci have ID names formed from 'WBTransposon' followed by a unique number, like WBTransposon00000623.
-
Their exon-like structure is curated as a Transposon_CDS object with a name like C29E6.6 formed from the YAC or cosmid or clone they are on followed by a number which uniquely identifies it from the other CDS-like objects on that clone, YAC or cosmid.
-
Transposons and Transposon_CDS are not currently classed as genes in WormBase and so do not have a parent gene object, the WBTransposon and representation on the Genome Browser should be viewed as analogous to the WBGene and how it is displayed.
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 1 year ago
Strains
-
A strain is a set of individuals of a particular genotype with the
capacity to produce more individuals of the same genotype. Strains
are given nonitalicized names consisting of two or three uppercase
letters followed by a number. The strain letter prefixes refer to
the laboratory of origin and are distinct from the mutation letter
prefixes. Examples: CB1833 is a strain of genotype dpy-5(e61) unc-13(e51),
originally constructed by S. Brenner at the MRC Laboratory of
Molecular Biology (strain prefix CB, allele prefix e),
and MT688 is a strain of genotype unc-32(e189) +/+ lin-12(n137) III; him-5(e1467) V,
constructed in the laboratory of H.R. Horvitz at M.I.T. (strain prefix MT, allele prefix n).
-
Strain prefixes are listed at the <a href="http://www.cbs.umn.edu/cgc/lab-code">CGC</a>.
-
Strains can and should be preserved as frozen stocks at -70C or ideally in
liquid nitrogen, in order to ensure long-term maintenance and to avoid drift
or accumulation of modifier mutations.
-
Bacterial strain names employ the two or three letter Laboratory/Strain designation, followed by “b”. For example, CBb###.
This facilitates distinguishing nematode strains from bacterial strains. Please provide full information on species and relevant genotype of
the bacteria.
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 349 days ago
CDS
-
Coding Sequences (CDSs) are the only part of a Gene's structure that is manually curated in WormBase. The structure of the Gene and its transcripts are derived from the structure of their CDSs.
-
CDSs have a Sequence Name that is derived from the same Sequence Name as their parent Gene object, so the gene F38H4.7 has a CDS called F38H4.7.
-
The CDS specifies coding exons in the gene from the START (Methionine) codon up to (and including) the STOP codon.
-
Any gene can code for multiple proteins as a result of alternative splicing.
-
These isoforms have a name that is formed from the Sequence Name of the gene with a unique letter appended.
-
In the case of the gene bli-4 there are 6 known CDS isoforms, called K04F10.4a, K04F10.4b, K04F10.4c, K04F10.4d, K04F10.4e and K04F10.4f.
-
It is common to refer to isoforms in the literature using the Approved gene family name with a letter appended, for example pha-4a, however this has no meaning within the WormBase database and a search for pha-4a in WormBase will not return anything. The correct name of this isoform is pha-4, isoform a.
Last edited by <a href="/resources/person/WBPerson1983" class="person-link" title="">Paul Davis</a> – 3 years ago
Mitochondrial Genome
The mitochondrial genotype of a worm can be expressed using the standard
nomenclature, using M as the abbreviation for the mitochondrial
linkage group. The mitochondrial genotype is written as the last element
in the genotype, following the nuclear genotype. Heteroplasmic combinations,
where mitochondria of different genotypes co-exist in the same cytoplasm,
can be expressed using a double forward slash, //. For example: uaDf5//+.
Last edited by <a href="/resources/person/WBPerson1983" class="person-link" title="">Paul Davis</a> – 3 years ago
RFLPs and SNPs
-
Polymorphic sites, which are mostly RFLPs (restriction fragment length
polymorphisms) or SNPs (single nucleotide polymorphisms), are designated
by an italic letter P and an italic number, preceded by the allele
prefix for the laboratory responsible for identifying the site.
Examples: stP17 and stP196 are RFLPs identified in the
laboratory of R. H. Waterston, amP6 and amP15 are SNPs
identified in the laboratory of K. Kornfeld.
Last edited by <a href="/resources/person/WBPerson10214" class="person-link" title="">Abigail Cabunoc</a> – 4 years ago
Proteins
-
The protein product of a gene can be referred to by the relevant gene name,
written in non-italic capitals, e.g., the protein encoded by unc-13
can be called UNC-13.
-
Where more than one protein product is predicted for a
gene (usually as a result of alternative message processing), the
different proteins are distinguished by adding 'isoform' and then the isoform letter derived from the isoform letter of the name of the WormBase CDS, e.g., the gene 'tra-1' has two CDS isoforms: 'Y47D3A.6a' and 'Y47D3A.6b' which give rise to the protein isoforms: 'TRA-1, isoform a' and 'TRA-1, isoform b'.
-
Mutant protein products can be named by the missense change, for example a
mutant 'TRA-1, isoform a' protein with a Pro to Leu change at codon 79 would be written: 'TRA-1, isoform a (P79L)'.
Last edited by <a href="/resources/person/WBPerson1983" class="person-link" title="">Paul Davis</a> – 3 years ago
Natural Copy Number Variants
Dozens of independent natural isolates of C. elegans
have been recovered, from multiple locations around the world. The genomes of
some of these isolates contain large (>10 kb) deletions, duplications or
insertions, relative to the reference wildtype strain, Bristol N2. Deletions
are named with the prefix niDf (natural isolate deficiency) followed
by a number. Duplications and insertions are named with the prefix niDp
(natural isolate duplication or insertion), followed by a number. Numbers for
niDf and niDp variants are assigned by application to:
genenames@wormbase.org
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 1 year ago
Introgressed regions in near-isogenic lines (aka congenic lines)
Genetic regions that have been introgressed from one natural isolate of
C. elegans onto the background of a different
natural isolate are named in a manner similar to that used for deficiencies
(Df) and duplications (Dp). Each Introgressed Region is
given an italicized name consisting of the relevant laboratory mutation
prefix, the letters IR, and a number. Thus, a region from the X
chromosome of Hawaiian strain CB4856 crossed onto a Bristol N2 background,
and created in the Kruglyak lab (allele code qq) has been given
the name qqIR1. Additional information about genetic map location
and strain origin can be provided in an optional parenthesis. So this
example could be more fully written as qqIR1(X, CB4856), with
the implicit assumption that the strain background is Bristol N2. The
strain background and the direction of introgression can also be specified,
using the symbol >, with this example being written qqR1(X, CB4856>N2).
Last edited by <a href="/resources/person/WBPerson241" class="person-link" title="">Todd Harris</a> – 4 years ago
Transgenes
-
Transformation of C. elegans with
exogenous DNA by microinjection usually leads to the formation
of a transmissible extrachromosomal array containing many copies
of the introduced DNA. Sometimes chromosomal integration of the
introduced DNA can occur, or an existing extrachromosomal array
can be integrated after irradiation of a transgenic line.
-
Extrachromosomal arrays are given italicized names consisting
of the laboratory allele prefix, the two letters Ex,
and a number.
-
Integrated transgenes are designated by italicized names consisting of the laboratory allele prefix, the two letters Is, and a number. Single copy integrants, usually generated by the MosSCI or miniMos insertion techniques, are a subset of integrated transgenes and are designated by italicized names consisting of the laboratory allele prefix, the two letters Si, and a number.
-
Transgenes designations Ex, Is and Si can optionally be followed by genotypic or molecular information describing the transgene, in square brackets. For example, eEx3 or eIs2 or stEx5[sup-7(st5) unc-22(+)]
-
Gene fusions incorporated in transgenes that consist of a
C. elegans gene or part thereof
fused to a reporter such as lacZ or GFP are indicated by the C. elegans
gene name followed by two colons and the reporter, all italicized:
pes-1::lacZ, mab-9::GFP. No specific recommendations
have been made for distinguishing between transcriptional and
translational fusions.
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 36 days ago
Genotypes
-
The genotype of an animal is specified by listing all known differences
between its genotype and that of wild type, which is defined by convention
as Bristol N2. Each such difference is assigned a unique name. The currently
recognized types of difference, described at greater length elsewhere in
these guidelines, are:
-
Simple mutations.Example: e2123.
-
New transposon insertions. Example: eTi13.
-
Sequence polymorphisms. Example: stP17.
-
Transgenes (extrachromosomal arrays). Example: stEx5.
-
Transgenes (chromosomally inserted). Example: mdIs18.
-
Chromosomal aberrations (duplications, deficiencies, inversions,
translocations, and crossover suppressors). Examples: nDp17, uaDf5, hIn1, eT1, mnC1.
-
Where necessary, wild type sequence can be indicated using the symbol +.
-
Because every genetic "feature" (i.e., difference from Bristol N2) has a unique
name, an animal's genotype is fully specified by listing all the named features
that it carries. Example: e2123; mdIs18.
-
For clarity and convenience, additional information about genes, chromosomes,
transgene contents, etc can be added as described elsewhere in this document,
to produce a more informative genotype. Example: pha-1(e2123ts) III; mdIs18[pha-1(+) unc-17::GFP]
-
Mutants carrying more than one mutation are designated by sequentially
listing mutant genes or mutations according to the left-right (= up-down)
order on the genetic map. Different linkage groups are separated by a
semicolon and given in the order I, II, III, IV, V, X, f, M. I-V
are the five autosomes, X is the X chromosome, f
refers to free duplications or chromosomal fragments, and M
is the mitochondrial genome. For example: dpy-5(e61) I; bli-2(e768) II; unc-32(e189) III.
-
Heterozygotes, with allelic differences between chromosomes, are designated by
separating mutations on the two homologous chromosomes with a slash. Where
unambiguous, wild-type alleles can be designated by a plus sign alone, or even
omitted. For example, dpy-5(e61) unc-13(+)/dpy-5(+) unc-13(e51) I can
also be written dpy-5 +/+ unc-13 or dpy-5/unc-13.
Last edited by <a href="/resources/person/WBPerson241" class="person-link" title="">Todd Harris</a> – 4 years ago
RNA Molecules
-
Messenger RNA species can be written by using the protein product
as a descriptor, for example TRA-1A mRNA, TRA-1B mRNA, in order
to allow distinction between different splice variants.
-
Non-coding RNA species can be written using the gene name as a
descriptor, for example lin-4 RNA. Small RNA species
derived from mir genes (micro-RNAs) can be written miR-,
followed by a number corresponding to the mir gene.
Example: miR-2 for the RNA derived from mir-2.
Last edited by <a href="/resources/person/WBPerson241" class="person-link" title="">Todd Harris</a> – 4 years ago
Phenotypes
-
Phenotypic characteristics can be described in words, e.g.,
dumpy animals or uncoordinated animals. If more convenient,
a non-italicized 3-letter or 4-letter abbreviation, which
usually corresponds to a gene class or gene name, may be used.
The first letter of a phenotypic abbreviation is capitalized,
e.g., Unc for uncoordinated, Dpy for dumpy. If necessary to
distinguish among related but distinguishable phenotypes, the
relevant gene number can be added, e.g., Unc-4 and Unc-13
to differentiate the distinct phenotypes produced
by mutations in the two genes unc-4 and unc-13.
WormBase maintains a standard set of defined phenotype descriptors (the <a href="http://wormbase.org/db/misc/phenotype">WormBase Phenotype Ontology</a>)
-
Abbreviations that do not correspond to a gene class or
gene name can also be used, e.g., Muv for multiple vulval development,
and Daf-c for dauer-formation-constitutive. Assignment of phenotype abbreviations
not corresponding to a gene name is controlled by WormBase and requests for
names should be made, before publication via email to: <A HREF="mailto:genenames@wormbase.org">genenames@wormbase.org</A>
-
A common and accepted convention, when comparing a mutant
with the wild-type, is to use the prefix non- to refer to
the wild-type phenotypes, for example, non-Lin (= wild type
cell lineage) or Dpy non-Unc (= wild type with respect to
movement, but dumpy with respect to body shape).
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 1 year ago
RNAi Phenotypes
-
Animals in which an endogenous gene has been down-regulated
by RNA interference (RNAi), after exposure to double-stranded
RNA corresponding to that gene, can be referred to as mutants,
using italicized RNAi as the mutation name. Example:
mog-4(RNAi), C08F8.8(RNAi) -
Phenotypes induced by RNAi can be named using conventional
mutant phenotype descriptors, such as Unc, Muv, Fem. For
high-throughput RNAi screens, which may detect only conspicuous
phenotypes, the more general phenotype descriptors could be used
(see the <a href="http://wormbase.org/db/misc/phenotype"> Phenotype Ontology</a>).
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 1 year ago
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 2 years ago
Last edited by <a href="/resources/person/WBPerson4025" class="person-link" title="">Gary Williams</a> – 4 years ago
Last edited by <a href="/resources/person/WBPerson1983" class="person-link" title="">Paul Davis</a> – 4 years ago
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 1 year ago
New methods for genome engineering (TALENs, CRISPR-Cas9, etc.) are increasingly being applied to C. elegans. These entail some additional recommendations to the standard Genetic Nomenclature Guidelines, as described below. The aim is to provide compact and unambiguous ways of describing and referring to engineered changes to endogenous loci, as distinct from transgenic constructs that are inserted elsewhere in the genome.
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 115 days ago
Other Nematodes
Research and genomic analysis of non-C. elegans species is increasing rapidly. An important mission of WormBase is to make available information for each species listed in the Overview section, within the database structure developed for C. elegans. For these organisms, gene naming will also be supervised by WormBase, in order to maximize consistency with C. elegans. It is recommended that nomenclature in general should follow the principles used for C. elegans, as far as possible. Gene name proposals and queries should be made online via <a href="http://minerva.caltech.edu/~azurebrd/cgi-bin/forms/gene_name.cgi">WormBase</a> or sent to <A HREF="mailto:genenames@wormbase.org">genenames@wormbase.org</A>
Species prefixes
In order to unambiguously specify the nematode species-of-origin, an optional 3-letter standard prefix and hyphen can be added to the gene name. Examples: the C. briggsae and Pristionchus pacificus orthologs of C. eleganstra-1 are called Cbr-tra-1 and Ppa-tra-1, respectively. WormBase coordinates the species prefix designations, to avoid the use of the same designation for more than one species; contact <A HREF="mailto:genenames@wormbase.org">genenames@wormbase.org</A> for prefix proposals.
Prefixes so far used include:
Species
|
Prefix
|
C. elegans
|
Cel-
|
C. briggsae
|
Cbr-
|
C. remanei
|
Cre-
|
C. brenneri
|
Cbn-
|
C. japonica
|
Cpj-
|
Heterorhabditis bacteriophora
|
Hba-
|
Oscheius tipulae
|
Oti-
|
Pristionchus pacificus
|
Ppa-
|
Gene naming: Homologous genes
Genes predicted from whole genome sequences in other nematode species will, in many cases, have identifiable close homologs in C. elegans, for which approved names already exist. In these cases, the same name should be used as in C. elegans, with the relevant species identifier.
Possible scenarios:
- One-to-one: Where one gene in C. elegans corresponds to a single gene in another nematode species, ortholog naming can be applied automatically. Example: thoc-1 in C. elegans has a C. briggsae ortholog, Cbr-thoc-1.
- One-to-many: Where one gene in C. elegans is related to multiple genes (paralogs) in another nematode species, these paralogs can be named using additional decimal numbers. Example: thoc-3 in C. elegans has two C. briggsae paralogs, Cbr-thoc-3.1 and Cbr-thoc-3.2.
- Many-to-one: Where multiple genes exist in C. elegans, but only a single gene in another nematode species, it is recommended that either the most closely similar, or the lowest numbered C. elegans gene, be used to name the single gene, as appropriate.
- Many-to-many: Where multiple closely related genes can be identified in both species, but the phylogenetic relationships of the two sets are complex, new gene numbers can be assigned to the set of genes in the other nematode species, after consultation with <A HREF="mailto:genenames@wormbase.org">genenames@wormbase.org</A>
In cases where a standard gene name has not yet been assigned in C. elegans, the gene can be referred to using the cosmid.number identifier for the C. elegans gene, preceded by a species prefix. Example: the ortholog of C. elegansW01B11.3 in Heterorhabditis bacteriophora can be referred to as Hba-W01B11.3. However, in such cases it will usually be both feasible and desirable to assign a standard name to the C. elegans gene as well, at the same time.
Gene naming: Non-homologous genes
It is expected that many genes in other nematode species will lack obvious close homologs in C. elegans, because of loss or substantial divergence during the evolution of C. elegans. These genes can be given new gene numbers, if they belong to an identifiable named class in C. elegans, or else new gene name classes can be established for them. In either case, assignment of an approved name should be made after consultation with <A HREF="mailto:genenames@wormbase.org">genenames@wormbase.org</A>
Gene naming: Forward genetics
A significant amount of mutation-based forward genetic analysis is
being pursued in nematodes other than C.
elegans, in particular using other species of Caenorhabditis (C.
briggsae, C. remanei, C. brenneri and others), as well as
species of Oscheius and Pristionchus. It is expected that most, but
not all, of the mutationally-defined genes discovered in these species
will prove to have orthologs with equivalent or similar function in
C. elegans, and hence that standard
genetic names will have been approved already. Several situations
can arise:
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 1 year ago
Last edited by <a href="/resources/person/WBPerson2970" class="person-link" title="">Mary Ann Tuli</a> – 36 days ago
(END)