Molecule

From WormBaseWiki
Jump to navigationJump to search

links to relevant pages
Caltech documentation
Example molecule pages


Overview

Molecule curation captures chemical and drug entities that have been shown to effect the biology of the worm. We provide links to other databases that deal with these molecule entities in greater detail.

  • What we mean by small molecule
    • drug
    • metabolite (primary and secondary)
    • monomers or very small oligomers of nucleic acids, proteins, and polysaccharides
    • "Large collections of small molecules (molecular weight about 600 or less), of similar or diverse nature which are used for high-throughput screening analysis of the gene function, protein interaction, cellular processing, biochemical pathways, or other chemical interactions." (from nlm.nih.gov and wikipedia)
    • Exogenous protein complexes with toxic effects- for example - Bt toxin.

Model

Original model

 ///////////////////////////small molecule/chemical/drug ////////////////////////////
 // 
 // ?Molecule
 //  * metabolites: precursors, intermediates, or end products of a metabolic pathway
 //  * monomeric or very small oligomeric nucleic acids (not RNAi primers), e.g. ATP, ADP, cAMP, GTP, trinucleotide repeats??
 //  * chemicals/drugs
 //  * minerals, ions, salts
 //
 ////////////////////////////////////////////////////////////////////////////////////
 
 ?Molecule  Name ?Text
           Public_name ?Text
           Synonym ?Text
           DB_info Database ?Database ?Database_field Text
           Gene_regulation Gene_regulator ?Gene_regulation XREF Molecule_regulator
           Regulate_expr_cluster ?Expression_cluster XREF Regulated_by_molecule //Wen WS228
	   WBProcess ?WBProcess XREF Molecule
           Affects_phenotype_of     Variation ?Variation ?Phenotype #Evidence
                                    Strain    ?Strain    ?Phenotype #Evidence
                                    Transgene ?Transgene ?Phenotype #Evidence
                                    RNAi      ?RNAi      ?Phenotype #Evidence
                                    Rearrangement  ?Rearrangement  ?Phenotype  #Evidence  //KY [110602 pad]
	   Interaction	?Interaction	XREF	Molecule_regulator
	   Molecule_use  ?Text  #Evidence
	   Reference ?Paper XREF Molecule
	   Remark ?Text #Evidence

Model elements

  • Name-> WBMol:ID
    • originally, this field contained the MeSH ID, when no MeSH ID was available, a WBMol ID was assigned, which was supposed to be replaced by the MeSH ID when available. Unfortunately, Names are scripted to be used in URL constructions for MeSH and CTD database links resulting in WBMol_id name based URLs would have to be suppressed or, result in bad URLs. Also, the increased use of Molecule annotation in other curation pipelines resulted in confusion with object creation. These points of confusion prompted a move to change the Name field change from MeSH default IDs to WBMol IDs for all objects.
  • Public name -> common name in elegans literature
  • Synonym -> other names used in papers, case sensitivity. Many other names can be taken in from CTD pages. Names need to be pipe separated
  • DB_info -> links to entity in other database. Database URLs need to be added to github external URL file.

Requested changes to model

WS253

   Biofunction_affected  GO_term_affected   ?GO_term   ?RO_term #Evidence 
    				Gene_affected ?Gene ?RO_term #Evidence
    				Molecule_affected ?Molecule ?RO_term #Evidence
				Other_affected ?Text ?RO_term #Evidence	
				WBProcess_affected   ?WBProcess  ?RO_term #Evidence

add community curator tags - update batch loaded data to witting
Rationale for WS253 proposed changes

Examples for molecule biofunction curation

Title: Nicotinamide adenine dinucleotide extends the lifespan of Caenorhabditis elegans mediated by sir-2 . 1 and daf-16 . 
Authors: Hashimoto T ; Horikawa M ; Nomura T ; Sakamoto K 
Journal: Biogerontology 
WBPaper00033112 / PMID 19370397
"This result suggests that NAD activates the caloric restriction pathway and DAF-2 / insulin signal pathway independently , and that daf-16 did not contribute to the signaling pathway of caloric restriction " 
"These findings are consistent with previous reports ( Tissenbaum and Guarente 2001 , Yang et al . 2005 ; van der Horst et al . 2004 , 2007 ; Brunet et al . 2004 ) and suggest that NAD activates SIR-2 . 1 , prior to activating the transcriptional activity of DAF-16 "
Molecule: NAD
Biofunction_role: Metabolite
Gene_affected: SIR-2.1  RO: directly activates (RO:0002406)
Gene_affected: DAF-16  RO: increases expression of (RO:0003003)
WBProcess_affected: DAF-2/ insulin signal pathway RO:indirectly activates (RO:0002407) 
---
Title: Genetic and molecular analysis of spe-27 , a gene required for spermiogenesis in Caenorhabditis elegans hermaphrodites . 
Authors: Minniti AN ; Sadler C ; Ward S 
Journal: Genetics 
Year: 1996-05 
Doc ID: WBPaper00002446
"Tyramine release from the RIM ( blue ) activates LGC-55 anion channel , which is expressed in the neck muscles , RMD / SMD motor neurons , and the AVB forward premotor interneuron ( purple ) "
Molecule: Pronase
Molecule: TEA
GO_term_affected: spermatid maturation (GO:0048240) RO:indirectly activates (RO:0002407)
Gene_affected: LGC-55 RO: directly activates (RO:0002406)
---
Title: Identification of a cysteine residue important for the ATPase activity of C . elegans fidgetin homologue . 
Authors: Yakushiji Y ; Yamanaka K ; Ogura T 
Journal: FEBS Lett 
Year: 2004-12-03 
Doc ID: WBPaper00024642
"It was found that ADP strongly inhibits the ATP hydrolysis activity , as little ATPase activity was detected at 3-fold excess of ADP over ATP , suggesting that the F32D1 . 1 ATPase activity may be tightly regulated by the hydrolyzed product ADP"
Molecule: ADP
GO_term_affected: ATP activity (GO:0016887) RO:directly inhibits (RO:0002408)
Gene_affected: F32D1.1 figl-1 RO:directly inhibits (RO:0002408)
---
Title: Bridging the phenotypic gap : real-time assessment of mitochondrial function and metabolism of the nematode Caenorhabditis elegans . 
Authors: Lagido C ; Pettitt J ; Flett A ; Glover LA 
Journal: BMC Physiol 
Year: 2008 
Doc ID: WBPaper00031649
" Azide inhibits complex IV of the mitochondrial respiratory chain by binding reversibly to cytochrome c oxidase [ 24 ] , this arrests the flow of electrons and leads to a decrease in ATP synthesis "
Molecule: Azide
GO_term_affected: mitochondrial electron transport, cytochrome c to oxygen (GO:0006123) RO:directly inhibits (RO:0002408)
GO_term_affected: mitochondrial electron transport, cytochrome c to oxygen (GO:0006123) RO:molecularly interacts with (RO:0002436)
---
Title: Functional analysis of pyrimidine biosynthesis enzymes using the anticancer drug 5-fluorouracil in Caenorhabditis elegans . 
Authors: Kim S ; Park DH ; Kim TH ; Hwang M ; Shim J 
Journal: FEBS J 
Year: 2009-09 
"No proteins exhibited uridine kinase activity when uridine was used as a substrate , but both C29F7 . 3 and F40F8 . 1 produced UDP when UMP was used as a substrate "
Molecule: UMP
Biofunction_role: Substrate
Gene_affected: C29F7.3 RO: input_of RO:0002352 <could use term, substrate_for>
Gene_affected: F40F8.1 RO: input_of RO:0002352
Molecule UDP
Biofunction_role: Metabolite
Gene_affected: C29F7.3 RO:produced_by (RO:0003001) <could use term product_of>
Gene_affected: F40F8.1 RO:produced_by (RO:0003001)
---
Title: Mutations in the Caenorhabditis elegans serotonin reuptake transporter MOD-5 reveal serotonin-dependent and -independent activities of fluoxetine . 
Authors: Ranganathan R ; Sawin ER ; Trent C ; Horvitz HR 
Journal: J Neurosci 
Year: 2001-08-15 
Doc ID: WBPaper00004809
"cat-4 encodes GTP cyclohydrolase I ( C . Loer , personal communication ) , which is required for the synthesis of a biopterin cofactor needed for dopamine and 5-HT biosynthesis ( Kapatos et al . , 1999 ) ."
Molecule: biopterin
Biofunction_role: Cofactor
GO_term_affected: dopamine biosynthesis
GO_term_affected: 5-HT biosynthesis
Gene_affected: cat-4 RO:produced_by (RO:0003001) <could use term product_of>

WS252

?Molecule Name ?Text //WBMoleculeID
          Public_name ?Text
          Formula ?Text //imported from ChEBI
          Monoisotopic_mass ?Float //need to set up automated calculation based on formula
          IUPAC ?Text //imported from ChEBI
          SMILES ?Text //imported from ChEBI
          InChi ?Text //imported from ChEBI
          InChiKey ?Text //imported from ChEBI
          Synonym ?Text //mainly from CTD, but also from papers and ChEBI
          DB_info Database ?Database ?Database_field ?Text //links molecule to ChEBI, CTD, KEGG, etc
          Status Detected #Evidence
                 Predicted #Evidence
          Detection_method ?Text #Evidence //NMR, MALDI-MS, HPLC-UV, shotgun lipidomics
          Extraction_method ?Text #Evidence //MeOH, exometabolome, MeOH/Chloroform, 5% trichloroacetic acid
          Nonspecies_source ?Text
          Chemical_synthesis #Evidence
          Endogenous_in ?Species #Evidence
          Biofunction_role Metabolite #Evidence //note there is no ?Text, which is part of the other Biofunction_role tags
                           Regulator ?Text #Evidence
                           Structural_component ?Text #Evidence
                           Cofactor ?Text #Evidence
                           Activator ?Text #Evidence
                           Inhibitor ?Text #Evidence
                           Product ?Text #Evidence
                           Substrate ?Text #Evidence
                           Ligand ?Text #Evidence
                           Receptor ?Text #Evidence
          Essential_for ?Species #Evidence
          WBProcess ?WBProcess XREF Molecule
          Regulate_expr_cluster ?Expression_cluster XREF Regulated_by_molecule    // Wen WS228
          Affects_phenotype_of Variation ?Variation ?Phenotype #Evidence
                               Strain ?Strain ?Phenotype #Evidence
                               Transgene ?Transgene ?Phenotype #Evidence
                               RNAi ?RNAi ?Phenotype #Evidence
                               Rearrangement ?Rearrangement ?Phenotype #Evidence
          Interaction ?Interaction XREF Molecule_interactor // Chris WS244
          Molecule_use ?Text #Evidence
          Reference ?Paper XREF Molecule
          Remark
Rationale for WS252 changes
  • New id tags

The following tags were requested by a user, these are important for metabolomics analysis and can be supplied by ChEBI (except monoisotopic mass, which can come from KEGG)

 
    Formula ?Text
    Monoisotopic_mass ?Float
    IUPAC ?Text
    SMILES ?Text
    InChi ?Text
    InChiKey ?Text
  • Status tag

User requested we include if a molecule has been detected in the worm or if it has only been predicted, detection and extraction methods support if a molecule has been detected

    Status  Detected  #Evidence
            Predicted #Evidence
    Detection_method ?Text #Evidence //NMR, MALDI-MS, HPLC-UV, shotgun lipidomics
    Extraction_method ?Text #Evidence //MeOH, exometabolome, MeOH/Chloroform, 5% trichloroacetic acid
  • Source tags

User requested that we indicate if the molecule is made endogenously in the worm or if it is an essential molecule needing to be supplied from elsewhere

    Chemical_synthesis #Evidence //Requested by F.Schroeder to index chemically synthesized molecules
    Nonspecies_source ?Text 
    Endogenous_for ?Species #Evidence                         
    Essential_in ?Species #Evidence
  • Biofunction_role tags

These are to record specific roles, actions, and relationships between molecules and other entities, biofunction role tags were requested by users and should make them easy to mine from the database.

          Biofunction_role Metabolite ?Text #Evidence
                           Regulator ?Text #Evidence
                           Structural_component ?Text #Evidence
                           Cofactor ?Text #Evidence
                           Activator ?Text #Evidence
                           Inhibitor ?Text #Evidence
                           Product ?Text #Evidence
                           Substrate ?Text #Evidence
                           Ligand ?Text #Evidence
                           Receptor ?Text #Evidence

 ///////////////////////////////////////////////////////////////////////////////////

Corresponding changes in touched models

/////
?Phenotype_info    Affected_by  Molecule  ?Molecule    #Evidence
/////
/////
?Gene_regulation  Regulator Molecule_regulator   ?Molecule  XREF  Gene_regulator  #Boolean 
/////

Molecule databases

Molecule IDs will be provided, when available, for the following databases:

  • Database "NLM_MeSH" "UID"
  • Database "CTD" "ChemicalID"
  • Database "ChemIDplus" using the CasRN
  • Database "ChEBI" "CHEBI_ID"
  • Database "KEGG COMPOUND" "ACCESSION_NUMBER"
  • Database "SMID-DB"

ChEBI

ChEBI terms and IDs are pulled from the chebi.obo on the ChEBI ftp server. This file is updated monthly (first Monday of the month except in the case of a Bank Holiday when it becomes the Tuesday). The ChEBI website is updated daily.

chebi.obo is stored on tazendra here: /home/postgres/work/pgpopulation/obo_oa_ontologies

Syncing with ChEBI

find_missing_chebi_in_other_oa.pl
identifies any molecule in the mop tables with no mop_chebi id but has been used in curation through paper tables of either mop, app, grg, pro, or rna tables. The output lists molecules that have not been manually matched to with a ChEBI ID - these are of three types:
1. exists in chebi.obo - if so, manually fill in chebi ID
2. does not exist in chebi.obo but can be requested - if so, request term from chebi
3. does not exist in chebi.obo and cannot be requested - most likely the molecule is a protein or complex, neither are curated by chebi, e.g., Shiga Toxin, BTtoxin, streptolysin. Currently there is no alternative database to link these things to - should at some point see if there is UniProt id for them.

/home/acedb/karen/molecule/chebi_missing/find_missing_chebi_in_other_oa.pl

checks molecule entries without chebi that have data for any of :
app_molecule
grg_moleculeregulator
pro_molecule
rna_molecule
mop_paper

print out mop_publicname mop_molecule mop_chemi mop_paper mop_smmid
as well as corresponding papers from other OAs.  aggregate all papers
for a given molecule objects, convert to PMID if pmid exists.

Output at file 'out'

output looks like this:
 
WBMol:00000139	Oxamniquine
mop	139
mop_chemi	21738-42-1
mop_molecule	D010073
mop_paper	"WBPaper00038322","WBPaper00031181"
aggregatedWBPapers	WBPaper00031181, WBPaper00038322
aggregatedWBPmids	pmid17988075, pmid21500322

WBMol:00000155	icariin
mop	155
mop_chemi	489-32-7
mop_molecule	C056599
mop_paper	"WBPaper00040566"
aggregatedWBPapers	WBPaper00040566
aggregatedWBPmids	pmid22216122

compare_chebi.pl(2015 11 04)
The goal of this script is to import more IDs from ChEBI into the mop_tables. The import relies on mop entries matching chebi objects. So the first thing the script does is to compare chebi.obo object name with mop_publicname and mop_synonym, in a case insensitive way. chebi.obo comes from ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo

The script outputs three files:

  • chebis_in_mop_vs_obo - compares chebi.obo and mop tables, finds molecules in mop tables that do not exist in the chebi.obo - based on name,
100241 not in mop
100246 not in mop
100461 not in mop
100921 not in chebi
101085 not in mop
101096 not in mop
  • chebi_name_discrepancies_case_sensitive and *chebi_name_discrepancies

These files list the mop values with chebi IDs but where the public names or synonyms do not match - the first file forces case sensitivity so there are many instances where the names are the same, it's just a matter of case (467 lines). The second file is case insensitive, this was much shorter (73 lines). I went through these manually and verified the entities were the same, there are two confusions:

CHEBI:30829 petroselaidic acid (CasRN 593-40-8) MOP- petroselinic acid (CasRN 4712-34-9) CHEBI:53692 hypaque (CasRN 117-96-4) MOP histopaque (CasRN 737-31-5)

Once names are matched, the corresponding values will be imported into mop_tables as follows: mop_table chebi.obo entry example mop_formula synonym: "<formula>" RELATED FORMULA [ChEBI:] (changed) mop_iupac (table not created?) synonym: "<NAME>" EXACT IUPAC_NAME [IUPAC:] mop_smiles synonym: "<smiles>" RELATED SMILES [ChEBI:] mop_inchi synonym: "<InChI=>" RELATED InChI [ChEBI:] mop_inchikey synonym: "<inchikey>" RELATED InChIKey [ChEBI:] mop_kegg KEGG COMPOUND:C00742 "KEGG COMPOUND" (added)

Molecule curation

Bulk (batch) upload

Instructions for general batch upload here
Create spread sheet with column names that match molecule OA tables
Use http://www.metaboanalyst.ca/faces/home.xhtml to map ids - at least need Chebi IDs from KEGG

mop_paper	mop_name	mop_publicname	mop_synonym	mop_chemi	mop_kegg	mop_chebi	mop_molformula	mop_biorole	mop_bioroletext	mop_status	mop_detctmethod	mop_otherdetctmethod	mop_extrctmethod	mop_otherextrctmethod	mop_chemicalsynthesis	mop_nonbiosource	mop_endogenousin
  • grab WBMol IDs and place in mop_name column
  • place WBPaperIDs in quotes as the OA field is multiontology
  • use species code for mop_endogenousin , i.e., for Caenorhabditis elegans, enter 6239
  • test upload on mangolassi with
./populate_oa_tab_file.pl mangolassi #### realfile.tsv      where #### is WBPersonID

populate missing mop_table values from chebi.obo with

/home/postgres/work/pgpopulation/mop_molecule/20151104_chebi_again/populate_other_chebi.out
make sure chebi.obo file is in the same directory.

Drug-phenotype curation

Molecules will be linked to genes based on their influence on gene activity altered by variation, overexpression, and RNAi-based knockdown.

Drug-gene interactions

Molecules will also be linked to genes through their influence on gene activity directly through gene regulation interactions.

--kjy 20:03, 16 December 2011 (UTC)

Molecule OA

Changes for WS252

  • syncing with chebi.obo
 on manoglassi /home/postgres/work/pgpopulation/mop_molecule/20151104_chebi_again/
 chebis_in_mop_vs_obo - chebi IDs in one file but not the other
 chebi_name_discrepancies - among chebi IDs that exist in both sets, when no name or synonym match the other
 compare_chebi.pl - works off of chebi.obo, can run in own directory with updated chebi.obo (copy script to directory with new version of chebi.obo)
*pgpopulation script 
 on mangolassi /home/postgres/work/pgpopulation/mop_molecule/20151104_chebi_again/populate_other_chebi.out

-incorporates chebi values for the other tables (mop_kegg, mop_smiles, mop_inchi, mop_inchikey, mop_molformula, mop_iupac, note:not mop_exactmass)
-should replace all the KEGG compound ids with what is in the chebi.obo if the KEGG compound id in mop_kegg is missing. If the KEGG compound ids are different in chebi.obo vs mop_kegg, replace the value with the chebi.obo value and report the discrepancy in the output.

 Output tells you if a chebiID does not exist. 
 'HAS MOP' are entries that have an obo and mop value 
 same vs diff is whether what's in obo and mop is the same or different
mop_table	chebi.obo entry	example
mop_molformula	synonym: "<formula>" RELATED FORMULA [ChEBI:]  (changed)
mop_iupac (table not created?)	synonym: "<NAME>" EXACT IUPAC_NAME [IUPAC:]
mop_smiles	synonym: "<smiles>" RELATED SMILES [ChEBI:]
mop_inchi	synonym: "<InChI=>" RELATED InChI [ChEBI:]
mop_inchikey	synonym: "<inchikey>" RELATED InChIKey [ChEBI:]
mop_kegg  KEGG COMPOUND:C00742 "KEGG COMPOUND"  (added)
mop_chemi xref: ChemIDplus:2140-79-6 "CAS Registry Number" (added 11/27/15)

Initial mop tables

For entering, editing, and storing information about molecule objects.

on tazendra or mangolassi:
~postgres/public_html/cgi-bin/oa/wormOA.pm

&initWormFields is like a switch to load a specific OA, each of them has a 3 letter code that corresponds to the postgres table name. so 'mop' is for the molecule oa and the 'mop_'

s in postgres mop then calls &initWormMopFields.
currently around line 1200 there are sets of hashes like :
 $fields{mop}{id}{type}                             = 'text';
 $fields{mop}{id}{label}                            = 'pgid';
 $fields{mop}{id}{tab}                              = 'all';
 $fields{mop}{paper}{type}                          = 'multiontology';
 $fields{mop}{paper}{label}                         = 'WBPaper';
 $fields{mop}{paper}{tab}                           = 'all';
 $fields{mop}{paper}{ontology_type}                 = 'WBPaper';
 $fields{mop}{name}{type}                           = 'text';
 $fields{mop}{name}{label}                          = 'Name';
 $fields{mop}{name}{tab}                            = 'all';

most of these correspond to a postgres table. the first one is the exception, which is the pgid. so excluding that in the example above, there are 2 postgres tables, mop_paper and mop_name

  • {type} is the type of data they hold
    • text
    • bigtext
    • ontology
    • dropdown
    • toggle
    • multiontology
    • multidropdown
  • {label} is the field name in the OA
  • {tab} is which tab number it should show in. 'all' shows up in

all tabs, but is mostly used in OAs with no numbered tabs

  • {ontology_type} if {type} is ontology / multiontology, ontology_type

refers to the type of values that go there for the autocomplete and validation.

  • {ontology_table} if the type is ontology / multiontology
 $fields{mop}{chebi}{type}                          = 'ontology';
 $fields{mop}{chebi}{label}                         = 'ChEBI_ID';
 $fields{mop}{chebi}{tab}                           = 'all';
 $fields{mop}{chebi}{ontology_type}                 = 'obo';
 $fields{mop}{chebi}{ontology_table}                = 'chebi';

which corresponds to the obo_ tables that are updates via cronjob or populated just once. in this case mapping to obo_<name|data|syn>_chebi

  • {dropdown_type} if the {type} is dropdown, which refers to the type of data like the {ontology_type}, but the values are hardcoded further down in the code instead of stored in postgres for querying.

mop_tables

postgres tables for the Molecule class
dumper on tazendra at karen/Molecule/dump_molecule_ace.pl
WS254 Requested changes WS253 Requested changes Model tag table type OA label tab ontology_type ontology_table cross table population script comments and past change requests (WSVersion)
pgonly mop_id not a table pgid 1 - - - WBMol: $molId = &pad8Zeros($newPgid)
pgonly mop_timestamp not a table none - - - -
pgonly mop_curator dropdown Curator 1 - - - same values in all OA curation tables
Name mop_name text Name 1 - - - WBMolID -added after tables were built
Public_name mop_publicname bigtext Public_name all - - -
Synonym mop_synonym bigtext Synonyms 1 - - -
DB_info mop_molecule text MeSH / CTD or default 1 - - - use MeSh ID here
originally used as NameWS252 - change field name to MeSH/CTD
DB_info mop_chemi text CasRN 1 - - -
DB_info mop_chebi ontology ChEBI_id 1 obo chebi downloaded from ChEBI server, updated monthly, ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo
DB_info mop_kegg text Kegg compound (Acc#) 1 - - -
Remark mop_remark bigtext Remark 1 - - -
Reference mop_paper multiontology WBPaper all WBPaper pap_tables -
DB_info mop_smmid text SMID-DB 1 - - -
InChi mop_inchi text InChi(standard) 1 - - import from chebi.obo, WS252-DUMP
InChiKey mop_inchikey text InChi key 1 - - import from chebi.obo, WS252-DUMP
SMILES mop_smiles text SMILES 1 - -
use for calculating trp_exactmass Formula mop_molformula text Formula 1 - - import from chebi.obo, WS252-DUMP
IUPAC mop_iupac text IUPAC 1 - - import from chebi.obo, WS252-ADD and DUMP
Calculate this value see section below Monoisotopic_mass mop_exactmass read-only Exact mass 1 - - figure out how to calculate based on mop_molformula, WS252-DUMP
Molecule_use mop_moleculeuse bigtext Molecule use 2 - - -
Not in model mop_gotarget multiontology GO target 2 obo goid - WS252-DELETED
Not in model mop_genetarget multiontology Gene target 2 WBGene gin_tables - WS252-DELETED
mop_classified text Classified as 2 dropdown - - WS252-DELETED
mop_source text source 2 dropdown - - WS252-DELETED
mop_species text Species 2 - - WS252-DELETED
mop_role dropdown Role 2 - - WS252-DELETED
Biofunction_role mop_biorole dropdown BioRole 2 dropdown values{Metabolite, Regulator, Structural_component, Cofactor, Activator, Inhibitor, Product, Substrate, Ligand, Receptor} - - WS252-ADD and DUMP with evidence
mop_bioroletext text BioRoleText 2 - corresponds to ?Text in model lines like : Structural_component ?Text #Evidence under the Biofunction_role tag WS252-ADD and DUMP in line with Biofunction_role tag and evidence
Essential_for mop_essentialfor ontology EssentialForSpecies 2 species list as in dis_species? - - WS252-ADD and DUMP with evidence
Status mop_status dropdown Status 2 dropdown{Detected, Predicted} - - WS252-ADD and DUMP with evidence
WS253 add to dropdown list HR-MAS-NMR
GC-MS
LC-MS
LC-Coularray
Detection_method mop_detctmethod dropdown DetectionMethod 2 dropdown values{NMR, MALDI-MS, HPLC-UV, shotgun lipidomics} - - WS252-ADD and DUMP with evidence
Detection_method mop_otherdetctmethod text OtherDetectionMethod 2 - - WS252-ADD and DUMP with evidence
WS253 add to dropdown list 80% MeOH
, ACN /0.1M NaCl
, EtOH
,M9
MeOH/CHCl3
MeOH/MTBE
phosphate buffer
mobile phase
M9 buffer
Extraction_method mop_extrctmethod dropdown ExtractionMethod 2 dropdown values{MeOH, exometabolome, MeOH/Chloroform, 5% trichloroacetic acid} - - WS252-ADD and DUMP with evidence
Extraction_method mop_otherextrctmethod text OtherExtractionMethod 2 - - WS252-ADD and DUMP with evidence
Chemical_synthesis mop_chemicalsynthesis toggle Chemical synthesis 2 - - WS252-ADD and DUMP with evidence
Nonspecies_source mop_nonbiosource text NonSpeciesSource 2 - - WS252-ADD and DUMP
Endogenous_in mop_endogenousin ontology EndogenousSpecies 2 species list as in app_pathogen - obo tables for ncbitaxonid - - WS252-ADD and DUMP with evidence
WS253 Add and dump mop_communitycurator ontology CommCurator 2 person ontology - -
WS253 Add and dump mop_communitycuratoremail text CommCuratorEmail 2 - - -
Gene_regulation grg_moleculeregulator multiontology Molecule Regulator GENEREG tab3 Molecule mop_tables -
WBProcess pro_molecule multiontology Molecule PROCESS tab1 Molecule mop_tables -
Phenotype app_molecule multiontology Molecule PHENOTYPE tab2 Molecule mop_tables -
RNAi rna_molecule multiontology Molecule RNAi tab2 Molecule mop_ tables -
Interaction int_moleculenondir
int_moleculeone
int_moleculetwo
Molecule Effected molecule
Affected molecule
Nondirectional molecule
INTERACTION tab3 Molecule mop_tables - Molecule were initially curated through a/effected_other fields and stored as text in a field in tab4.
  • "all" can mean all tabs, but also is used for OA's that only have one tab.

Calculating monoistopic mass

Also called Exact mass

Calculate mass of entry in trp_molformula when it exists, examples include:

  • CH3COOH
  • H2SO4
  • Mg(OH)2 -> this needs to be translated to Mg + O(2) + H(2)

Calculation is the sum of the isotopic masses of each occurrence of each element in the mol formula.

Ex: CH3COOH has 2C, 4H and 2O
Masses are 
C = 12.000000
H = 1.007825
O = 15.994915

So the isotopic mass is  2(12.00000) + 4(1.007825) + 2(15.994915) = 60.021130
Enter 60.021130 into trp_exactmass

Masses are retrieved from http://www2.warwick.ac.uk/fac/sci/chemistry/chemintra/postgrad/taughtmasters/mscinfo/ch908/atomic_mass_abund.pdf
You need to use the value with the highest abundance for those elements that have multiple masses.
I can provide a better table for reference

Molecule list

Molecule names have changed to WBMoleculeIDs
Initially, we will be using MeSH UIDs, assigned by the NLM, as IDs for the molecules in our database. Due to the more comprehensive coverage of the NLM molecules, and the fact that it is more stably funded, this source was thought to be a good starting point for this project. The list we are starting with has been pared down and was created by the Comparative Toxicogenomic Database (CTD), which contains over 130,000 terms. For each term, this list contains a term name, CTD ID, MeSH UID, and where available CAS Registry Numbers. Using the CasRNs, we extracted the ChEBI ID from the Chemical Entities of Biological Interest database entity list, where it existed, along with any KEGG Compound accession number.

A sample molecule.ace record:

Molecule : "C009687"
Public_name "wortmannin"
Database "NLM_MeSH" "UID" "C009687"
Database "CTD"  "ChemicalID" "C009687"
Database "ChemIDplus"  "19545-26-7"
Database "ChEBI" "CHEBI_ID" "52289"
Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C15181"

To make a working list of reference molecules for the various curation efforts, we used Textpresso to scan for all terms on the list that have been published in the C. elegans corpus. The resulting list is less than 6000 terms. The terms that have been identified in the corpus and available based on frequency in corpus and concatenated into one file
This last file is being used as a starting file for molecule look-up by WB curators.
Caveats and notes:

  • The list is now small enough that if we wanted to load it into WB at least we know that every term has some relevance to the literature (although unverified).
  • The list is small enough to be amenable to editing through ontology editors like OBOedit (even though it is not an ontology).
  • We do not have definitions of the terms, nor are the terms arranged in any hierarchical manner; however other databases do, and we provide links to those websites if an ID is available.
  • Terms and synonyms of terms, will be added as needed, this curation effort still needs to be worked out, ideally the list will be incorporated as a selection list for whatever curation tool a curator is using.

Molecule Curation Pipeline

Molecule Upload

test.ace

#NOTE: use spaces instead of tabs
Molecule : "WBMol:00001516"
Public_name	"triethanolamine"
Synonym	"TEA"
Database	"NLM_MeSH" "UID" "C009546"
Database	"CTD" "ChemicalID" "C009546"
Database	"ChEBI" "CHEBI_ID" "28621"
Database	"ChemIDplus" "RN" "102-71-6"
Database	"KEGG COMPOUND" "ACCESSION_NUMBER" "C06771"
Molecule_use	"spermatid activator"
Reference	"WBPaper00040819"

Molecule : "WBMol:00001853"
Public_name	"NAD"
Monoisotopic_mass "664.116399"
Formula "C21H28N7O14P2"
IUPAC "adenosine 5'-{3-[1-(3-carbamoylpyridinio)-1,4-anhydro-D-ribitol-5-yl] dihydrogen diphosphate}"
SMILES "[H][C@]1(COP(O)(=O)OP(O)(=O)OC[C@@]2([H])O[C@@]([H])([N+]3=CC=CC(=C3)C(O)=N)[C@]([H])(O)[C@]2([H])O)O[C@@]([H])(N2C=NC3=C(N)N=CN=C23)[C@]([H])(O)[C@]1([H])O"
InChi "InChI=1S/C21H27N7O14P2/c22-17-12-19(25-7-24-17)28(8-26-12)21-16(32)14(30)11(41-21)6-39-44(36,37)42-43(34,35)38-5-10-13(29)15(31)20(40-10)27-3-1-2-9(4-27)18(23)33/h1-4,7-8,10-11,13-16,20-21,29-32H,5-6H2,(H5-,22,23,24,25,33,34,35,36,37)/p+1/t10-,11-,13-,14-,15-,16-,20-,21-/m1/s1"
InChiKey "BAWFJGJZGIEFAR-NNYOXOHSSA-O"
Database	"NLM_MeSH" "UID" "D009243"
Database	"CTD" "ChemicalID" "D009243"
Database	"ChEBI" "CHEBI_ID" "13393"
Database	"ChemIDplus" "RN" "53-84-9"
Metabolite
Endogenous_in "Caenorhabitis elegans"

Molecule : "WBMol:00004910"
Public_name	"resveratrol"
Synonym	"Resveratrol"
Synonym  "Trans-resveratrol"
Synonym  "3,4',5-Trihydroxystilbene"
Synonym  "3,5,4'-Trihydroxystilbene"
Synonym  "Resvida"
IUPAC "5-[2-(4-hydroxyphenyl)ethenyl]benzene-1,3-diol"
SMILES "[H]C(=C([H])c1cc(O)cc(O)c1)c1ccc(O)cc1"
InChi "InChI=1S/C14H12O3/c15-12-5-3-10(4-6-12)1-2-11-7-13(16)9-14(17)8-11/h1-9,15-17H"
InChiKey "LUKBXSAWLPMMSZ-UHFFFAOYSA-N"
Database	"NLM_MeSH" "UID" "C059514"
Database	"CTD" "ChemicalID" "C059514"
Database	"ChEBI" "CHEBI_ID" "27881"
Database	"ChemIDplus" "RN" "501-36-0"
Database	"KEGG COMPOUND" "ACCESSION_NUMBER" "C03582"
Molecule_use	"antioxidant"
Reference	"WBPaper00026929"
Nonspecies_source "Human Food - Grapes"

  • Database info for the chemical/small molecule database links needs to be manually edited on github in the external_urls file whenever there are changes to the databases associated with molecule data.


OBSOLETE: During each upload a molecule.ace file will be made in citace by Wen. This file will contain all the molecule cross references from within the RNAi and Variation Phenotype curation, merging them with the molecule data from the molecule list.

Papers flagged for chemical/molecule curation

WBPaperID Comment
461 tetramisole
464 Yes
484 Yes
493 enhanced sensitivity of flu-2 mutants to EMS
536 pharmacological analysis of cell function (spermatozoan motility) not gene function
1001 Vanadate, AMP-PNP, ATP-gamma-S. NEM, Triton X-100, Taxol, analysis on unknown motor protein
1010
1524 forskolin, NaF, AlCl3, GTPgammaS, GppNHp, GDPbetaS, pertussis toxin, cholera toxin
2029 campthothecin, berenil, trypanocidal drug, magnesium ion on DNA relaxation, and isolated topoisomerase, no gene product mentioned
2116
3137 sodium arsenite, mercuric choride
3150 ouabain
13430 hematin, benzimidazole, Albendazole Diethylcarbamazine Fenbendazole Hematin Imidazole Ivermectin Levamisole Mebendazole Methimazole Morantel tartrate Oxibendazole Piperazine Pyrantel tartrate Thiabendazole: only tested in vitro on H. contortus GST activity.
24940 aldicarb
24950 diacetyl
25017 fig 4
25173 Fig 5, Fig 7.
28357 yes
28562 octopamine
28879 serotonin
28900 fig.3
29114 yes
29130 fig.3, fig.4
30726 isoamyl alcohol
30928 fig.2 d) Exogenous serotonin and fluoxetine suppress 100G-induced DAF- 16TGFP nuclear accumulation.
31225 yes
31321 Figure 3. High NaCl accelerates aging of C. elegans
31336 nitrogen mustard
31342 yes
31419 yes
31424 fig.7
31427 vitamin E
31428 yes
31456 tunicamycin
31464 yes
31468 fig.3
31474 yes
31482 FUDR
31483 mPyrazine
31490 yes
31509 yes
31530 yes
31535 yes
31571 yes
31593 yes
31604 nicotine
31626 yes
31627 yes
31644 yes
31657 yes
31667 yes
31669 ethanol
31672 yes
31682 yes
31683 yes
31690 yes
31694 yes
31703 haem
31810 yes
31824 yes
31834 yes
31850 fluorathene
31857 yes
31866 yes
31871 DHP
31872 aldicarb
31873 arsenite
31882 8-Br-cGMP
31895 Figure 4. Responses of wild-type (N2) and slo-1 mutant worms to aldicarb and levamisole
31897 fig.4
31915 yes
31924 yes
31939 table 2
31941 paraquat
31959 yes
31977 imipramine
31982 glutamate
31991 Flavone (2-phenyl chromone)
31992 NaCl
31994 arsenite
31996 fig.1
31999 diltiazem
32000 arecoline
32007 levamisole
32008 aldicarb
32011 aldicarb
32031 NaCl
32033 yes
32035 yes
32050 yes
32072 amiloride
32077 C17ISO
32079 Dextran-HCC
32093 Quercetin
32101 cadmium, chlorpyrifos, nickel, prochloraz, diuron
32103 yes
32125 imidacloprid, thiacloprid
32131 yes
32142 phosphine, FCCP (carbonyl cyanide 4-(trifluoromethoxy)phenylhydrazone), PCP (2,3,4,5,6pentachlorophenol), DNP (2,4-dinitrophenol), sodium azide
32143 yes
32181 Table S2. Activity of the ascarosides in Daf-c and Daf-d strain backgrounds in the dauer formation assay
32192 tunicamycin
32207 1NA-PP1
32215 butanone, benzaldehyde
32232 paraquat
32237 paraquat
32241 paraquat
32243 ceramide
32252 aldicarb
32255 H2O2
32259 3-methyladenine
32266 metaboilte DA dafachronic acid, steroids
32271 ethanol aldicarb
32286 root exudate
32295 Ethosuximide
32319 NaCl and isoamyl alcohol (chemoattractants).
32335 serotonin
32336 benzaldehyde
32358 aldicarb
32359 ivermectin
32366 levamisole, nicotine
32390 flavonoid quercetin
32427 volatile anesthetic, halothane
32429 1-octanol
32470 Species specificity of dauer pheromone extracts between elegans and Pristionchus. Response of Pristionchus and Strongyloides to different dafachronic acids.
32475 mianserin methiothepin
32478 levamisole
32491 amino acids
32494 AgNO3, CdCl2,CrCl2, CoCl2, CuSO4, HgCl2, MnCl2, NiSO4, Pb(NO3)2,and ZnCl2
32508 chlorpyrifos oxon; cadmium chloride); hexachlorophene; neurotoxicants; chlorpyrifos, methyl mercury, chlordiazepoxide, tebuconazol; Cocaine; metals, ethanol, solvents, organophosphate; carbamate pesticides
32517 Fig 1.The structures of the dauer pheromone components, ascaroside C6 (1), ascaroside C9 (2), ascaroside C3 (3), and the less active ascaroside C7 (4). Fig 2. The chemical structures of the long-chain ascarosides from conditioned medium extracts from long-term dhs-28 cultures. The structural assignment of long-chain ascaroside 17 is tentative
32522 Sulfonamide CA inhibitors (CAIs) such as acetazolamide AZA, methazolamide MZA or ethoxzolamide EZA, many new sulfonamides (such as GUZ
32878 NaN3, benzaldehyde
32880 hydrogen peroxide
32881 5-HT; Octanol, DiD
32884 paraquat, juglone
32886 dafachronic acids; Supp. New compounds reported in this study are: III, S-III, S-V, S-XV, S-XVI, S-XXII, XXIII, S-XXIII, XXIV, XXVII.
32887
32888 dichlorvos,an organophosphorus insecticide, acetylcholinesterase, cadmium chloride
32889 sulfonamides; CAIs; 2-(hydrazinocarbonyl)-3-substituted-phenyl-1H-indole-5-sulfonamides possessing various 2-,3- or 4-substituted phenyl groups with methyl-, halogeno- and methoxy-functionalities, as well as the perfluorophenyl moiety; AZA and EZA
32901 CO2
32903 paraquat
32918 Thus, (25R)-D7-dafachronic acid (2a) is one order of magnitude more active than (25R)-D4-dafachronic acid (1a). Similar to a previous study,5 little or no activity was detected with (25R)-cholestenoic acid (3a)
32925 cycloheximide
32932 resveratrol peptone
32949 Paraquat, tunicamycin
32956 These chemicals used on Pratylenchus penetrans: Acetic acid Propionic acid Isobutyric acid n-Butyric acid Isovaleric acid n-Valeric acid n-Caproic acid These chemicals used on C. elegans Acetic acid n-Caproic acid
32958 Fe-exposure
32968 tetramisole
32989 The levels of superoxide radical (.0^2-), in both mitochondria and cytosol, are increased in sod-1(tm776) and sod-1(tm783) mutants.
32997 dauer pheromone
33002 aldicarb levamisole
33004 electrically evoked and light evoked pharyngeal cholinergic post synaptic potentials reduced in amplitude by nicotinic antagonists benzoquinonium chloride and d-tubocurarine. Effect dose dependent.
33009 checked
33024 chloramine-T (CHT), DTT, H^20^2: tested for effects on both KVS-1 activity in vitro and on chemotaxis in vivo.
33037 dauer pheromone; 8-bromo-cGMP
33040 paraquat, excess O^2
33049 Carbamate, Aldicarb, Carbofuran, Oxamyl, Neostigmine, Eserine, Organophosphates, Fenomiphos, Ethoprop, Parathion, Paraoxon, Phorate, Terbufos, meta-chlorperbenzoic acid, m-CPBA, diotioate
33051 identification of new daf-22 dependent dauer pheromone and mating pheromones ascr#7, ascr#8, and ascr#6.1
33060 green tea polyphenol epigallocatechin gallate (EGCG) -
33077 flavopiridol olomoucine II
33086 hyperosmotic, sodium chloride, anoxia
33094 PMA induces the pnlp-29::gfp transgene
33099 dichlorvos fenamiphos organophosphates organophosphorous pesticides, neurotoxicant, mefloquine,
33115 low oxygen, lactacystn
33126 tunicamycin, glucose, deoxyglucose, sorbitol, glucose analog
33130 D-ribose affect larval growth
33158 Yes. Response to Diacetyl.
33162 The uncoupler CCCP (carbonylcyanide- 3-chlorophenylhydrazone) extends lifespan.
33166 dietary zinc
33168 Antipsychotic drugs, cyclosporin A
33189 Iron, PQS, PQS+Fe3
33433 copper
33441 Pb, Hg, Cd, and Cr metals CdCl2, CrCl2, HgCl2, and Pb(NO3)2 in solution were used here: 2.5M, 50M, and 100M.
33448 Catechin Hydrogen peroxide catechin hydrate
33456 8-bromo cGMP, paraquat, vinpocetine, zaprinast, EHNA (erythro-9-[2-hydroxy-3-nonyl] adenine)
34686 CdCl2, CrCl2, HgCl2, Pb(NO3)2
34687 Screened ~54,000 chemicals from various libraries: Bioactives, natural product extracts, Analyticon purified natural product compounds, Diversity-oriented synthesis, ChemBridge kinases, ChemDiv, TimTec, MayBridge, ChemBridge. note: screened on glp-4(bn2);sek-1(km4) mutant worms Anti-infective hits that cure C. elegans of an E. faecalis infection at a concentration lower than the in vitro MIC with E. faecalis are grouped into 6 structural classes (representative structures shown).
34688 Bis-[4-methoxy-3- [3-(4-fluorophenyl)-6-(4-methylphenyl)-2(aryl)-tetrahydro-2Hpyrazolo[ 3,4-d]thiazol-5-yl]phenyl]methanes nematicidal activity
34706 Exposure to examined metals caused severe lethality toxicities in L1- and L2-larvae...
34717 fig 1, 40um juglone caused a significant increase in lifespan
34757 juglone, paraquat
34758 Rotenone, paraquat
34766 Cry5B toxin
35074 tribendimidine, levamisole, pyrantel
35082 drugs: muscimol and serotonin
35083 myxothiazol, FCCP
35098 Rib1P, 2-deoxy-a-d-ribose 1-phosphate (dRib1P), uridine, UMP, UDP, UTP, ATP, 2-deoxyuridine, 5-FU, 5dFUR, orotidine 5¢-phosphate, PPRP and 5-FU
35114 Pcm-1 mutant dauer larvae exposed to juglone develop into adults with a defect in egg-laying (Egl). Pcm-1 mutant eggs exposed to the oxidizing agents paraquat, homocysteine, and homocysteine thiolactone undergo a developmental delay more pronounced than that observed in wild-type animals. Pcm-1 mutant eggs exposed to homocysteine develop in to adults with a defect in egg-laying.
35522 2-nonanone