Difference between revisions of "UserGuide:Nomenclature"

From WormBaseWiki
Jump to navigationJump to search
(Replaced content with "100px <span style="color:red"> '''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature n...")
 
Line 4: Line 4:
 
</span>
 
</span>
 
[[File:Warning.jpg|100px]]
 
[[File:Warning.jpg|100px]]
 
 
= Genetic Nomenclature for ''Caenorhabditis elegans''  =
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
Genetic nomenclature for ''Caenorhabditis elegans'' is supervised by [http://wormbase.org/ WormBase] in collaboration with the [http://www.cbs.umn.edu/CGC/ ''Caenorhabditis'' Genetics Center (CGC)].
 
 
== How to Register a New Gene Class or Gene Name  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
Investigators wishing to register new gene names for ''C. elegans'' should note the summary guidelines below and apply online via [http://minerva.caltech.edu/~azurebrd/cgi-bin/forms/gene_name.cgi WormBase] or by email application to '''genenames@wormbase.org'''
 
 
== How to Register a New Laboratory and Receive Lab, Strain and Allele designations  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
Specific identifying codes (CGC designations) are assigned to each laboratory engaged in dedicated long-term genetic research on ''C. elegans''. Each such laboratory is assigned a lab/strain code, for naming strains, and an allele code, for naming mutations and transgenes. These codes are listed at the [http://biosci.umn.edu/CGC/nomenclature/code.html CGC] and on the [http://www.wormbase.org/wiki/index.php/Nomenclature WormBase wiki].
 
 
Investigators requiring new CGC designations should apply to '''jonathan.hodgkin@bioch.ox.ac.uk'''
 
 
= Summary Guidelines for Proposing New Gene Names  =
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
#Gene names must conform to the standard format of 3 or 4 letters, hyphen, number.
 
#Genes can be named on the basis of a mutant phenotype or on the basis of the predicted protein product or RNA product.
 
#If a new gene clearly belongs in an existing gene class (of which more than 2000 now exist), then a new gene number will be assigned after consultation with the laboratory responsible for the gene class in question. Gene classes and the corresponding assigning laboratory for each gene class are listed at the [http://biosci.umn.edu/CGC/nomenclature/genes.html CGC] and on the [http://www.wormbase.org/wiki/index.php/Nomenclature WormBase wiki].
 
#If the establishment of a new gene class name seems more appropriate, then an approval for this gene name must be obtained, preferably online via [http://minerva.caltech.edu/~azurebrd/cgi-bin/forms/gene_name.cgi WormBase] or by email application to '''genenames@wormbase.org'''
 
#Gene names based on homology with a previously named gene in another well-studied organism, such as ''Saccharomyces cerevisiae'' or ''Mus musculus'', are often appropriate and desirable, especially where there is convincing orthology between genes.
 
#Gene names and gene numbering schemes that conform to established nomenclature proposals for particular protein classes are desirable.
 
#Gene names that are memorable, informative and simply explained are encouraged.
 
#Gene names based solely on RNAi phenotypes or high-throughput analysis of gene expression or protein interactions are discouraged.
 
#Gene names including ''c'' (for ''caenorhabditis''), ''ce'' (for ''C. elegans''), ''n''(for nematode) or ''w''(for worm) are discouraged. ''C. elegans'' as the organism of origin can be specified with a prefix (''Cel''-) if desired.
 
#New gene name classes can be assigned in confidence, prior to formal publication or disclosure in an abstract.
 
 
= Standard Genetic Nomenclature Recommendations  =
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
This summary is based on the original proposals for ''C. elegans'' nomenclature (Horvitz ''et al''., 1979 ''Mol. Gen. Genet''. '''175'''<nowiki>: 129-133), plus additional recommendations that have been distributed in </nowiki>''The Worm Breeder's Gazette'' or posted on [http://www.wormbase.org WormBase].
 
 
== Genetic Loci  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Genes are given names consisting of three or four italicized letters, a hyphen, and an italisized Arabic number, e.g., ''dpy-5'' or ''let-37'' or ''mlc-3''. The gene name may be followed by an italicized Roman numeral, to indicate the linkage group on which the gene maps, e.g., ''dpy-5 I'' or ''let-37 X'' or ''mlc-3 III''.
 
 
*For genes defined by mutation, the gene names refer to the mutant phenotype originally detected or most easily scored e.g.
 
**dumpy (''d''um''py'') in the case of ''dpy-5''
 
**lethal (''let''hal) in the case of ''let-37''.
 
 
*For genes defined on the basis of sequence similarity or sequence features, the gene name refers to the predicted protein product or RNA product e.g.
 
**''my''osin ''l''ight chain in the case of ''mlc-3'',<br>
 
**''super''o''xide ''d''ismutase in the case of ''sod-1'',<br>''
 
**''NPHP'' (human kidney disease nephronophthisis gene) in the case of nph-4.
 
**''r''ibosomal ''RN''A in the case of rrn-1.
 
 
*Genes with related properties are usually given the same three-letter name and different numbers. For example, there are three known myosin light chain genes: ''mlc-1, mlc-2, mlc-3'', and more than twenty different dumpy genes: ''dpy-1, dpy-2, dpy-3'', and so on.
 
 
*Genes can be given names corresponding to homologous named genes in other standard genetic organisms. e.g.
 
**''rnt-1'' is the ''C. elegans'' ortholog of the Drosophila gene ''runt''.
 
**''wrn-1'' is the ''C. elegans'' ortholog of the human gene WRN1, responsible for Werner's syndrome.
 
 
*Gene names that are memorable, informative and simply explained are encouraged.
 
 
*Genes in a paralogous set related to a single named gene in another organism are sometimes given the same gene name and number, followed by a distinguishing decimal. e.g. four ''C. elegans'' genes homologous to ''SIR2'' in ''S. cerevisiae'' have been given the names ''sir-2.1, sir-2.2, sir-2.3, sir-2.4''.
 
 
*Pseudogenes, for which there is good evidence that no functional product is ever generated, can be indicated by adding the optional italic suffix ''ps'' to the gene name, as in ''msp-48ps''.
 
 
*Gene names based solely on RNAi phenotypes or high-throughput analysis of gene expression or protein interaction are discouraged.
 
 
*Gene names including ''c'' (for ''Caenorhabditis''), ''ce'' (for ''C. elegans''), ''n'' (for nematode) or ''w''(for worm) are discouraged. Instead, an optional prefix ''Cel''- can be added to indicate the species origin.
 
 
*A limited number of genes have been given temporary ''tag''- names (''tag'' = ''t''emporarily ''a''ssigned ''g''ene name). These are genes for which deletion alleles have been generated by reverse genetic methods, but which have not yet been given more informative names based on sequence or mutant phenotype. When sufficient information becomes available, each ''tag''- name will be replaced by an appropriate standard 3-letter or 4-letter name.
 
 
*A limited number of genes, named on the basis of sequence homology, have been given non-standard names ending with alphanumeric identifiers rather than with simple numbers, in order to make these names closer to the generally accepted names used in other organisms. e.g. ''eif-3.B, eif-3.C'' encode proteins of the conserved translation factor eIF3.
 
 
=== Gene Name Conflicts  ===
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
Gene names that have been established in the published literature and databases should preferably not be changed. In cases where a gene has received multiple names, one name will be adopted as the main name for the gene. Other names will continue to be listed in databases. Whenever possible, name changes or the adoption of a single main name should be made with the approval of all laboratories concerned.
 
 
=== Homologous Genes  ===
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
If a homolog of a known ''C. elegans'' gene is identified in a related species such as ''Caenorhabditis briggsae'', this can be given the same gene name, preceded by three italic letters referring to the species, and a hyphen. For example, ''Cbr-tra-1'' is the name for the ''C. briggsae'' homolog of the ''C. elegans'' gene ''tra-1''. The ''C. elegans'' homolog of a gene identified and named in another organism can be distinguished by the same convention, using "Cel-" as an optional prefix. For example, ''Cel-snt-1'' defines the ''C. elegans'' synaptotagmin gene.
 
 
=== Alleles and Mutations  ===
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Every mutation has a unique designation. Mutations are given names consisting of one or two italicized letters followed by an italicized Arabic number, e.g., ''e61'' or ''mn138'' or ''st5''. The letter prefix refers to the laboratory of isolation, as registered with the [http://www.cbs.umn.edu/CGC/nomenclature/labhead.html CGC]. There are currently more than 500 registered laboratories. For example, ''e'' refers (originally) to the MRC Laboratory of Molecular Biology (Cambridge, U.K.), (currently) to the laboratory of J. Hodgkin (University of Oxford), and ''st'' refers to the laboratory of R.H. Waterston (originally at Washington University, St. Louis, MO, currently at the University of Washington, Seattle).
 
 
*When gene and mutation names are used together, the mutation name is included in parentheses after the gene name, e.g., ''dpy-5(e61), let-37(mn138)''. When unambiguous (e.g., if only one mutation is known for a given gene or if all work on a gene described in a publication used a single mutation cited in a Methods section), gene names are used in preference to mutation names (''let-37'' rather than ''mn138'' or ''let-37(mn138'')).
 
 
*Optional suffixes indicating characteristics of a mutation can follow a mutation name. These are usually two-letter nonitalicized letters, e.g., ''hc17''ts, where ts stands for temperature-sensitive, or ''pk15''te, where te stands for transposon-excision.
 
 
*Mutations created by ''in vitro'' mutagenesis should receive standard allele names. For cases where a pre-existing genomic mutation is re-created by in vitro mutagenesis, it is still desirable to give the new mutation a new name.
 
 
*The wild-type allele of a gene is defined as that present in the Bristol N2 strain, stored frozen at the CGC and other locations. Wild-type alleles can be designated by a plus sign immediately after the gene name, ''dpy-5+'', or, more commonly, by including the plus sign in parentheses, ''dpy-5(+)''.
 
 
=== Gene Knockouts  ===
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Most gene knockouts constructed to date are small deletions (<5 kb) generated by transposon excision or by chemical mutagenesis. These are named as alleles, sometimes with the optional suffix te (transposon-excision) or ko (knockout). Example: ''zyx-1(gk190)'' is a 777 bp deletion in the ''zyx-1'' gene.
 
 
*Some knockouts have been made by insertion of a selectable marker, such as ''unc-119(+)''. These are named as alleles, with an optional descriptor defining the selected marker following the unique allele name, and preceded by a double colon. Example: ''jf61 = zhp-3(jf61::unc-119+)''
 
 
*Some of the small deletions generated by reverse genetic methods may remove parts of two adjacent genes. If only two genes appear to be affected, then the deletion is given a single allele name, but the genotype is written with both gene names coupled with an ampersand (&amp;). Example: allele ''ok615'' is a 1422 bp deletion of two adjacent genes, so it can be written ''rad-54&tag-157(ok615)''.
 
 
*Deletions that affect more than two genes are named as Deficiencies ''(Df)'', as described in the Chromosomal Aberrations section.
 
 
=== Modifers: Suppressors, Revertants and Enhancers  ===
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*There is no special nomenclature for modifier mutations. Many extragenic suppressor loci are called ''sup'' (40 ''sup'' loci defined so far, with a wide variety of properties and mechanisms). An increasing number of more specific modifier gene classes have been established, such as ''smu'' (suppressor of ''mec'' and ''unc''), and ''smg'' (''s''uppressor with ''m''orphogenetic effect on ''g''enitalia) and ''sel'' (''s''uppressor/''e''nhancer of ''l''in-12).
 
 
*Intragenic suppressors or modifiers are indicated by adding a second mutation name within parentheses; for example, ''unc-17(e245e2608)'' is an intragenic partial revertant of ''unc-17(e245)''.
 
 
*Mutations known to be chromosomal rearrangements, rather than intragenic lesions, are named differently, as described in the Chromosomal Aberrations section.
 
 
=== Chromosomal Aberrations  ===
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Duplications (''Dp'') deficiencies (''Df''), inversions (''In'') and translocations (''T'') are known in ''C. elegans'' cytogenetics; these are given italicized names consisting of the laboratory mutation prefix, the relevant abbreviation, and a number, optionally followed by the affected linkage groups in parentheses (e.g., ''eT1(III;V), mnDp5(X;f)'', where ''f'' indicates a free duplication). Chromosomal balancers of unknown structure can be designated using the abbreviation ''C'', e.g., ''mnC1(II)''.
 
 
== Transposons and Transposon Insertions  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*''C. elegans'' transposons are called Tc1, Tc2, etc., where each number represents a different family. Transposon names are not italicized except when included in a genotype. Different races of ''C. elegans'' have different distributions of these transposons, which result in polymorphic differences from the reference wild-type strain Bristol N2. These natural differences between races are given polymorphism names, as described below.
 
 
*The endogenous transposons of ''C. elegans'' can be mobilized to generate new insertional mutations. In addition, foreign transposons such as Mos1 can be introduced by transformation, and then mobilized to create new insertions. All these newly generated transposon insertions can be named as simple mutations, with an optional suffix indicating the nature of the transposon. They are treated as alleles of named genes if they are located within the boundaries of a gene. Example: ''r293'' is a Tc1 insertion in the gene ''unc-54''. An optional descriptor can also be added after a double colon to indicate the nature of the insertion. Example: ''unc-54(r293::Tc1)''.
 
 
*Note that such insertions may often be silent in terms of gene activity, for example if an insertion occurs within an intron and can be spliced out.
 
 
*Newly generated transposon insertions, especially those located in apparently intergenic regions, may also be given ''Ti'' (transposon insertion) names. These consist of a prefix identifying the laboratory of origin, the two letters ''Ti'', and a number, all italicized. Example: ''eTi13'' is an insertion of a Mos transposon into an intergenic region on ''LGIII''.
 
 
== RFLPs and SNPs  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Polymorphic sites, which are mostly '''RFLPs''' (restriction fragment length polymorphisms) or '''SNPs''' (single nucleotide polymorphisms), are designated by an italic letter ''P'' and an italic number, preceded by the allele prefix for the laboratory responsible for identifying the site. <br>
 
 
Examples: ''stP17'' and ''stP196'' are RFLPs identified in the laboratory of R. H. Waterston, ''amP9'' and ''amP15'' are SNPs identified in the laboratory of K. Kornfeld.
 
 
== Natural Copy Number Variants  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
Dozens of independent natural isolates of ''C. elegans'' have been recovered, from multiple locations around the world. The genomes of some of these isolates contain large (>10 kb) deletions, duplications or insertions, relative to the reference wildtype strain, Bristol N2. Deletions are named with the prefix ''niDf'' (natural isolate deficiency) followed by a number. Duplications and insertions are named with the prefix ''niDp'' (natural isolate duplication or insertion), followed by a number. Numbers for ''niDf'' and ''niDp'' variants are assigned by application to: '''genenames@wormbase.org'''
 
 
== Introgressed regions in near-isogenic lines (aka congenic lines)  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
Genetic regions that have been introgressed from one natural isolate of ''C. elegans'' onto the background of a different natural isolate are named in a manner similar to that used for deficiencies (''Df'') and duplications (''Dp''). Each Introgressed Region is given an italicized name consisting of the relevant laboratory mutation prefix, the letters ''IR'', and a number. Thus, a region from the X chromosome of Hawaiian strain CB4856 crossed onto a Bristol N2 background, and created in the Kruglyak lab (allele code ''qq'') has been given the name ''qqIR1''. Additional information about genetic map location and strain origin can be provided in an optional parenthesis. So this example could be more fully written as ''qqIR1(X, CB4856)'', with the implicit assumption that the strain background is Bristol N2. The strain background and the direction of introgression can also be specified, using the symbol &gt;, with this example being written ''qqIR1(X, CB4856&gt;N2)''.
 
 
== Transgenes  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Transformation of ''C. elegans'' with exogenous DNA by microinjection usually leads to the formation of a transmissible extrachromosomal array containing many copies of the introduced DNA. Sometimes chromosomal integration of the introduced DNA can occur, or an existing extrachromosomal array can be integrated after irradiation of a transgenic line.
 
 
*Extrachromosomal arrays are given italicized names consisting of the laboratory allele prefix, the two letters ''Ex'', and a number.
 
 
*Integrated transgenes are designated by italicized names consisting of the laboratory allele prefix, the two letters ''Is'', and a number.
 
 
*Both ''Ex'' and ''Is'' can optionally be followed by genotypic or molecular information describing the transgene, in square brackets. For example, ''eEx3'' or ''eIs2'' or ''stEx5[sup-7(st5) unc-22(+)]''.
 
 
*Gene fusions incorporated in transgenes that consist of a ''C. elegans'' gene or part thereof fused to a reporter such as lacZ or GFP are indicated by the C''. elegans'' gene name followed by two colons and the reporter, all italicized: ''pes-1::lacZ, mab-9::GFP''. No specific recommendations have been made for distinguishing between transcriptional and translational fusions.
 
 
== Genotypes  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*The genotype of an animal is specified by listing all known differences between its genotype and that of wild type, which is defined by convention as Bristol N2. Each such difference is assigned a unique name. The currently recognized types of difference, described at greater length elsewhere in these guidelines, are:
 
**Simple mutations.Example: ''e2123''.
 
**New transposon insertions. Example: ''eTi13.''
 
**Sequence polymorphisms. Example: ''stP17''.
 
**Transgenes (extrachromosomal arrays). Example: ''stEx5''.
 
**Transgenes (chromosomally inserted). Example: ''mdIs18''.
 
**Chromosomal aberrations (duplications, deficiencies, inversions, translocations, and crossover suppressors). Examples: ''nDp17, uaDf5, hIn1, eT1, mnC1''.
 
 
*Where necessary, wild type sequence can be indicated using the symbol +.
 
 
*Because every genetic "feature" (i.e., difference from Bristol N2) has a unique name, an animal's genotype is fully specified by listing all the named features that it carries. Example: ''e2123; mdIs18''.
 
 
For clarity and convenience, additional information about genes, chromosomes, transgene contents, etc can be added as described elsewhere in this document, to produce a more informative genotype. Example: ''pha-1(e2123ts) III; mdIs18[pha-1(+) unc-17::GFP]''
 
 
*Mutants carrying more than one mutation are designated by sequentially listing mutant genes or mutations according to the left-right (= up-down) order on the genetic map. Different linkage groups are separated by a semicolon and given in the order ''I, II, III, IV, V, X, f, M. I-V'' are the five autosomes, ''X'' is the ''X'' chromosome, ''f'' refers to free duplications or chromosomal fragments, and ''M'' is the mitochondrial genome. For example: ''dpy-5(e61) I; bli-2(e768) II; unc-32(e189) III.''
 
 
*Heterozygotes, with allelic differences between chromosomes, are designated by separating mutations on the two homologous chromosomes with a slash. Where unambiguous, wild-type alleles can be designated by a plus sign alone, or even omitted. For example, ''dpy-5(e61) unc-13(+)/dpy-5(+) unc-13(e51) I'' can also be written ''dpy-5 +/+ unc-13 or dpy-5/unc-13.''
 
 
== Mitochondrial Genome  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
The mitochondrial genotype of a worm can be expressed using the standard nomenclature, using ''M ''as the abbreviation for the mitochondrial linkage group. The mitochondrial genotype is written as the last element in the genotype, following the nuclear genotype. Heteroplasmic combinations, where mitochondria of different genotypes co-exist in the same cytoplasm, can be expressed using a double forward slash, //. For example: ''uaDf5//+''.
 
 
== DNA sequences  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*There are no specific recommendations for designating cloned sequences that are not similar to known genes. Most genomic clones have been provided by the ''C. elegans'' mapping/sequencing consortium (based at the [http://wiki.wormbase.org/index.php?title=Cosmids/YACs Wellcome Trust Sanger Institute, Cambridge, UK], and the [http://genome.wustl.edu/ Genome Sequencing Center, St. Louis, USA)]. Cosmid clones generated by the consortium are named on the basis of the vector, either pJB8 (initial letters B, C, D, E, R, M, ZC) or a Lorist vector (initial letters K, T, W, F, ZK). Phage clones (in Lambda 2001) are identified by the initial letters A, ZL, YSL. Some fosmid clones are identified by the initial letter H. Vancouver fosmid clones are identified by initial letters WRM.
 
 
*YACs (yeast artificial chromosome clones) are identified by the initial letter Y, e.g., Y3D5. YAC subsequences may be given names derived from the initial YAC name. Example: subsequences derived from the YAC Y47H9 have been called Y47H9A, Y47H9B, Y47H9C. Note that physical clones corresponding to these subsequences are not available.
 
 
*Genomic DNA clones that have not been generated by the consortium are usually designated by the laboratory strain designation (see below), a # symbol and an isolation number, e.g., MT#JAL6.
 
 
*Sequences that are predicted to be genes from sequence data alone are initially named by the consortium on the basis of the sequenced cosmid, plus a number. For example, the genes predicted for the cosmid T05G3 are called T05G3.1, T05G3.2, etc. (numbered in arbitrary order of definition). Such names can be superseded by standard 3-letter names when this becomes appropriate. Thus, R13F6.3 has been given the name srg-12 (for ''s''erpentine ''r''eceptor, class ''g''amma).
 
 
*EST (Expressed Sequence Tag) clones have received names with prefixes such as cm and yk.
 
 
== Proteins  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*The protein product of a gene can be referred to by the relevant gene name, written in non-italic capitals, e.g., the protein encoded by ''unc-13'' can be called UNC-13. Where more than one protein product is predicted for a gene (usually as a result of alternative message processing), the different proteins are distinguished by additional capital letters, e.g., TRA-1A, TRA-1B.
 
 
*Mutant protein products can be named by the missense change, for example a mutant TRA-1A protein with a Pro to Leu change at codon 79 would be written: TRA-1A (P79L).
 
 
== RNA Molecules  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Messenger RNA species can be written by using the protein product as a descriptor, for example TRA-1A mRNA, TRA-1B mRNA, in order to allow distinction between different splice variants.
 
 
*Non-coding RNA species can be written using the gene name as a descriptor, for example ''lin-4'' RNA. Small RNA species derived from ''mir'' genes (micro-RNAs) can be written miR-, followed by a number corresponding to the ''mir'' gene. Example: miR-2 for the RNA derived from ''mir-2''.
 
 
== Phenotypes  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Phenotypic characteristics can be described in words, e.g., dumpy animals or uncoordinated animals. If more convenient, a nonitalicized 3-letter or 4-letter abbreviation, which usually corresponds to a gene name, may be used. The first letter of a phenotypic abbreviation is capitalized, e.g., Unc for uncoordinated, Dpy for dumpy. If necessary to distinguish among related but distinguishable phenotypes, the relevant gene number can be added, e.g., Unc-4 and Unc-13 to differentiate the distinct phenotypes produced by mutations in the two genes ''unc-4'' and ''unc-13''. Abbreviations that do not correspond to gene names can also be used, e.g., Muv for multiple vulval development, and Daf-c for dauer-formation-constitutive. WormBase maintains a standard set of defined phenotype descriptors (the [http://wormbase.org/db/misc/phenotype WormBase Phenotype Ontology]).
 
 
*A common and accepted convention, when comparing a mutant with the wild-type, is to use the prefix non- to refer to the wild-type phenotypes, for example, non-Lin (= wild type cell lineage) or Dpy non-Unc (= wild type with respect to movement, but dumpy with respect to body shape).
 
 
== RNAi Phenotypes  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*Animals in which an endogenous gene has been down-regulated by RNA interference (RNAi), after exposure to double-stranded RNA corresponding to that gene, can be referred to as mutants, using italicized ''RNAi'' as the mutation name. Example: ''mog-4(RNAi)'', C08F8.8''(RNAi)''
 
 
*Phenotypes induced by RNAi can be named using conventional mutant phenotype descriptors, such as Unc, Muv, Fem. For high-throughput RNAi screens, which may detect only conspicuous phenotypes, the more general phenotype descriptors could be used (see the <span style="text-decoration: underline;"></span>[http://wormbase.org/db/misc/phenotype  Phenotype Ontology]).
 
 
== Strains  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*A strain is a set of individuals of a particular genotype with the capacity to produce more individuals of the same genotype. Strains are given nonitalicized names consisting of two or three uppercase letters followed by a number. The strain letter prefixes refer to the laboratory of origin and are distinct from the mutation letter prefixes. Examples: CB1833 is a strain of genotype ''dpy-5(e61) unc-13(e51)'', originally constructed by S. Brenner at the MRC Laboratory of Molecular Biology (strain prefix CB, allele prefix ''e''), and MT688 is a strain of genotype ''unc-32(e189) +/+ lin-12(n137) III; him-5(e1467) V'', constructed in the laboratory of H.R. Horvitz at M.I.T. (strain prefix MT, allele prefix ''n'').
 
 
*Strain prefixes are listed at the [http://biosci.umn.edu/CGC/nomenclature/code.html CGC].
 
 
*Strains can and should be preserved as frozen stocks at -70C or ideally in liquid nitrogen, in order to ensure long-term maintenance and to avoid drift or accumulation of modifier mutations.
 
 
== Sources  ==
 
<span style="color:red">
 
'''Warning:''' The content of this page has been migrated under [http://www.wormbase.org/about/userguide/nomenclature nomenclature] on the new www.wormbase.org site.
 
</span>
 
 
 
*All genetic data for ''C. elegans'' are summarized in WormBase
 
 
(Bieri et al. 2007, Nucleic Acids Res. 35 Database Issue: D506-510; Harris et al. 2004, Nucleic Acids Res. 32 Database issue: D411-417).
 
 
*Queries on recommended nomenclature for ''C. elegans'' should be addressed to: '''genenames@wormbase.org'''or to the curator for ''C. elegans'' Genetic Mapping and Genetic Nomenclature
 
 
(Dr Jonathan Hodgkin, Genetics Unit, Department of Biochemistry, University of Oxford, UK): '''jonathan.hodgkin@bioch.ox.ac.uk'''
 
 
 
 
[[Category:User Guide]]
 

Latest revision as of 12:44, 13 November 2015

Warning.jpg Warning: The content of this page has been migrated under nomenclature on the new www.wormbase.org site. Warning.jpg