Molecule
links to relevant pages
Caltech documentation
Example molecule pages
Contents
Overview
Molecule curation captures chemical and drug entities that have been shown to effect the biology of the worm. We provide links to other databases that deal with these molecule entities in greater detail.
- What we mean by small molecule
- drug
- metabolite (primary and secondary)
- monomers or very small oligomers of nucleic acids, proteins, and polysaccharides
- "Large collections of small molecules (molecular weight about 600 or less), of similar or diverse nature which are used for high-throughput screening analysis of the gene function, protein interaction, cellular processing, biochemical pathways, or other chemical interactions." (from nlm.nih.gov and wikipedia)
- Exogenous protein complexes with toxic effects- for example - Bt toxin.
Model
Original model
///////////////////////////small molecule/chemical/drug //////////////////////////// // // ?Molecule // * metabolites: precursors, intermediates, or end products of a metabolic pathway // * monomeric or very small oligomeric nucleic acids (not RNAi primers), e.g. ATP, ADP, cAMP, GTP, trinucleotide repeats?? // * chemicals/drugs // * minerals, ions, salts // //////////////////////////////////////////////////////////////////////////////////// ?Molecule Name ?Text Public_name ?Text Synonym ?Text DB_info Database ?Database ?Database_field Text Gene_regulation Gene_regulator ?Gene_regulation XREF Molecule_regulator Regulate_expr_cluster ?Expression_cluster XREF Regulated_by_molecule //Wen WS228 WBProcess ?WBProcess XREF Molecule Affects_phenotype_of Variation ?Variation ?Phenotype #Evidence Strain ?Strain ?Phenotype #Evidence Transgene ?Transgene ?Phenotype #Evidence RNAi ?RNAi ?Phenotype #Evidence Rearrangement ?Rearrangement ?Phenotype #Evidence //KY [110602 pad] Interaction ?Interaction XREF Molecule_regulator Molecule_use ?Text #Evidence Reference ?Paper XREF Molecule Remark ?Text #Evidence
Model elements
- Name-> WBMol:ID
- originally, this field contained the MeSH ID, when no MeSH ID was available, a WBMol ID was assigned, which was supposed to be replaced by the MeSH ID when available. Unfortunately, Names are scripted to be used in URL constructions for MeSH and CTD database links resulting in WBMol_id name based URLs would have to be suppressed or, result in bad URLs. Also, the increased use of Molecule annotation in other curation pipelines resulted in confusion with object creation. These points of confusion prompted a move to change the Name field change from MeSH default IDs to WBMol IDs for all objects.
- Public name -> common name in elegans literature
- Synonym -> other names used in papers, case sensitivity. Many other names can be taken in from CTD pages. Names need to be pipe separated
- DB_info -> links to entity in other database. Database URLs need to be added to github external URL file.
Requested changes to model
WS253
Biofunction_affected GO_term_affected ?GO_term ?RO_term #Evidence Gene_affected ?Gene ?RO_term #Evidence Molecule_affected ?Molecule ?RO_term #Evidence Other_affected ?Text ?RO_term #Evidence WBProcess_affected ?WBProcess ?RO_term #Evidence
add community curator tags - update batch loaded data to witting
- Changes to other models
- ?Gene
- ?WBProcess
- Create ?RO_class //Relationship ontology class populated by RO.obo https://raw.githubusercontent.com/oborel/obo-relations/master/ro.obo the ontology can be browsed here: http://www.ontobee.org/browser/term.php?o=RO&iri=http://www.w3.org/2002/07/owl%23ObjectProperty&graph=http://purl.obolibrary.org/obo/merged/RO
Rationale for WS253 proposed changes
Examples for molecule biofunction curation
Title: Nicotinamide adenine dinucleotide extends the lifespan of Caenorhabditis elegans mediated by sir-2 . 1 and daf-16 . Authors: Hashimoto T ; Horikawa M ; Nomura T ; Sakamoto K Journal: Biogerontology WBPaper00033112 / PMID 19370397 "This result suggests that NAD activates the caloric restriction pathway and DAF-2 / insulin signal pathway independently , and that daf-16 did not contribute to the signaling pathway of caloric restriction " "These findings are consistent with previous reports ( Tissenbaum and Guarente 2001 , Yang et al . 2005 ; van der Horst et al . 2004 , 2007 ; Brunet et al . 2004 ) and suggest that NAD activates SIR-2 . 1 , prior to activating the transcriptional activity of DAF-16 " Molecule: NAD Biofunction_role: Metabolite Gene_affected: SIR-2.1 RO: directly activates (RO:0002406) Gene_affected: DAF-16 RO: increases expression of (RO:0003003) WBProcess_affected: DAF-2/ insulin signal pathway RO:indirectly activates (RO:0002407) --- Title: Genetic and molecular analysis of spe-27 , a gene required for spermiogenesis in Caenorhabditis elegans hermaphrodites . Authors: Minniti AN ; Sadler C ; Ward S Journal: Genetics Year: 1996-05 Doc ID: WBPaper00002446 "Tyramine release from the RIM ( blue ) activates LGC-55 anion channel , which is expressed in the neck muscles , RMD / SMD motor neurons , and the AVB forward premotor interneuron ( purple ) " Molecule: Pronase Molecule: TEA GO_term_affected: spermatid maturation (GO:0048240) RO:indirectly activates (RO:0002407) Gene_affected: LGC-55 RO: directly activates (RO:0002406) --- Title: Identification of a cysteine residue important for the ATPase activity of C . elegans fidgetin homologue . Authors: Yakushiji Y ; Yamanaka K ; Ogura T Journal: FEBS Lett Year: 2004-12-03 Doc ID: WBPaper00024642 "It was found that ADP strongly inhibits the ATP hydrolysis activity , as little ATPase activity was detected at 3-fold excess of ADP over ATP , suggesting that the F32D1 . 1 ATPase activity may be tightly regulated by the hydrolyzed product ADP" Molecule: ADP GO_term_affected: ATP activity (GO:0016887) RO:directly inhibits (RO:0002408) Gene_affected: F32D1.1 figl-1 RO:directly inhibits (RO:0002408) --- Title: Bridging the phenotypic gap : real-time assessment of mitochondrial function and metabolism of the nematode Caenorhabditis elegans . Authors: Lagido C ; Pettitt J ; Flett A ; Glover LA Journal: BMC Physiol Year: 2008 Doc ID: WBPaper00031649 " Azide inhibits complex IV of the mitochondrial respiratory chain by binding reversibly to cytochrome c oxidase [ 24 ] , this arrests the flow of electrons and leads to a decrease in ATP synthesis " Molecule: Azide GO_term_affected: mitochondrial electron transport, cytochrome c to oxygen (GO:0006123) RO:directly inhibits (RO:0002408) GO_term_affected: mitochondrial electron transport, cytochrome c to oxygen (GO:0006123) RO:molecularly interacts with (RO:0002436) --- Title: Functional analysis of pyrimidine biosynthesis enzymes using the anticancer drug 5-fluorouracil in Caenorhabditis elegans . Authors: Kim S ; Park DH ; Kim TH ; Hwang M ; Shim J Journal: FEBS J Year: 2009-09 "No proteins exhibited uridine kinase activity when uridine was used as a substrate , but both C29F7 . 3 and F40F8 . 1 produced UDP when UMP was used as a substrate " Molecule: UMP Biofunction_role: Substrate Gene_affected: C29F7.3 RO: input_of RO:0002352 <could use term, substrate_for> Gene_affected: F40F8.1 RO: input_of RO:0002352 Molecule UDP Biofunction_role: Metabolite Gene_affected: C29F7.3 RO:produced_by (RO:0003001) <could use term product_of> Gene_affected: F40F8.1 RO:produced_by (RO:0003001) --- Title: Mutations in the Caenorhabditis elegans serotonin reuptake transporter MOD-5 reveal serotonin-dependent and -independent activities of fluoxetine . Authors: Ranganathan R ; Sawin ER ; Trent C ; Horvitz HR Journal: J Neurosci Year: 2001-08-15 Doc ID: WBPaper00004809 "cat-4 encodes GTP cyclohydrolase I ( C . Loer , personal communication ) , which is required for the synthesis of a biopterin cofactor needed for dopamine and 5-HT biosynthesis ( Kapatos et al . , 1999 ) ." Molecule: biopterin Biofunction_role: Cofactor GO_term_affected: dopamine biosynthesis GO_term_affected: 5-HT biosynthesis Gene_affected: cat-4 RO:produced_by (RO:0003001) <could use term product_of>
WS252
?Molecule Name ?Text //WBMoleculeID Public_name ?Text Formula ?Text //imported from ChEBI Monoisotopic_mass ?Float //need to set up automated calculation based on formula IUPAC ?Text //imported from ChEBI SMILES ?Text //imported from ChEBI InChi ?Text //imported from ChEBI InChiKey ?Text //imported from ChEBI Synonym ?Text //mainly from CTD, but also from papers and ChEBI DB_info Database ?Database ?Database_field ?Text //links molecule to ChEBI, CTD, KEGG, etc Status Detected #Evidence Predicted #Evidence Detection_method ?Text #Evidence //NMR, MALDI-MS, HPLC-UV, shotgun lipidomics Extraction_method ?Text #Evidence //MeOH, exometabolome, MeOH/Chloroform, 5% trichloroacetic acid Nonspecies_source ?Text Chemical_synthesis #Evidence Endogenous_in ?Species #Evidence Biofunction_role Metabolite #Evidence //note there is no ?Text, which is part of the other Biofunction_role tags Regulator ?Text #Evidence Structural_component ?Text #Evidence Cofactor ?Text #Evidence Activator ?Text #Evidence Inhibitor ?Text #Evidence Product ?Text #Evidence Substrate ?Text #Evidence Ligand ?Text #Evidence Receptor ?Text #Evidence Essential_for ?Species #Evidence WBProcess ?WBProcess XREF Molecule Regulate_expr_cluster ?Expression_cluster XREF Regulated_by_molecule // Wen WS228 Affects_phenotype_of Variation ?Variation ?Phenotype #Evidence Strain ?Strain ?Phenotype #Evidence Transgene ?Transgene ?Phenotype #Evidence RNAi ?RNAi ?Phenotype #Evidence Rearrangement ?Rearrangement ?Phenotype #Evidence Interaction ?Interaction XREF Molecule_interactor // Chris WS244 Molecule_use ?Text #Evidence Reference ?Paper XREF Molecule Remark
Rationale for WS252 changes
- New id tags
The following tags were requested by a user, these are important for metabolomics analysis and can be supplied by ChEBI (except monoisotopic mass, which can come from KEGG)
Formula ?Text Monoisotopic_mass ?Float IUPAC ?Text SMILES ?Text InChi ?Text InChiKey ?Text
- Status tag
User requested we include if a molecule has been detected in the worm or if it has only been predicted, detection and extraction methods support if a molecule has been detected
Status Detected #Evidence Predicted #Evidence Detection_method ?Text #Evidence //NMR, MALDI-MS, HPLC-UV, shotgun lipidomics Extraction_method ?Text #Evidence //MeOH, exometabolome, MeOH/Chloroform, 5% trichloroacetic acid
- Source tags
User requested that we indicate if the molecule is made endogenously in the worm or if it is an essential molecule needing to be supplied from elsewhere
Chemical_synthesis #Evidence //Requested by F.Schroeder to index chemically synthesized molecules Nonspecies_source ?Text Endogenous_for ?Species #Evidence Essential_in ?Species #Evidence
- Biofunction_role tags
These are to record specific roles, actions, and relationships between molecules and other entities, biofunction role tags were requested by users and should make them easy to mine from the database.
Biofunction_role Metabolite ?Text #Evidence Regulator ?Text #Evidence Structural_component ?Text #Evidence Cofactor ?Text #Evidence Activator ?Text #Evidence Inhibitor ?Text #Evidence Product ?Text #Evidence Substrate ?Text #Evidence Ligand ?Text #Evidence Receptor ?Text #Evidence ///////////////////////////////////////////////////////////////////////////////////
Corresponding changes in touched models
///// ?Phenotype_info Affected_by Molecule ?Molecule #Evidence /////
///// ?Gene_regulation Regulator Molecule_regulator ?Molecule XREF Gene_regulator #Boolean /////
Molecule databases
Molecule IDs will be provided, when available, for the following databases:
- Database "NLM_MeSH" "UID"
- Database "CTD" "ChemicalID"
- Database "ChemIDplus" using the CasRN
- Database "ChEBI" "CHEBI_ID"
- Database "KEGG COMPOUND" "ACCESSION_NUMBER"
- Database "SMID-DB"
ChEBI
ChEBI terms and IDs are pulled from the chebi.obo on the ChEBI ftp server. This file is updated monthly (first Monday of the month except in the case of a Bank Holiday when it becomes the Tuesday). The ChEBI website is updated daily.
chebi.obo is stored on tazendra here: /home/postgres/work/pgpopulation/obo_oa_ontologies
Syncing with ChEBI
find_missing_chebi_in_other_oa.pl
identifies any molecule in the mop tables with no mop_chebi id but has been used in curation through paper tables of either mop, app, grg, pro, or rna tables. The output lists molecules that have not been manually matched to with a ChEBI ID - these are of three types:
1. exists in chebi.obo - if so, manually fill in chebi ID
2. does not exist in chebi.obo but can be requested - if so, request term from chebi
3. does not exist in chebi.obo and cannot be requested - most likely the molecule is a protein or complex, neither are curated by chebi, e.g., Shiga Toxin, BTtoxin, streptolysin. Currently there is no alternative database to link these things to - should at some point see if there is UniProt id for them.
/home/acedb/karen/molecule/chebi_missing/find_missing_chebi_in_other_oa.pl checks molecule entries without chebi that have data for any of : app_molecule grg_moleculeregulator pro_molecule rna_molecule mop_paper print out mop_publicname mop_molecule mop_chemi mop_paper mop_smmid as well as corresponding papers from other OAs. aggregate all papers for a given molecule objects, convert to PMID if pmid exists. Output at file 'out' output looks like this: WBMol:00000139 Oxamniquine mop 139 mop_chemi 21738-42-1 mop_molecule D010073 mop_paper "WBPaper00038322","WBPaper00031181" aggregatedWBPapers WBPaper00031181, WBPaper00038322 aggregatedWBPmids pmid17988075, pmid21500322 WBMol:00000155 icariin mop 155 mop_chemi 489-32-7 mop_molecule C056599 mop_paper "WBPaper00040566" aggregatedWBPapers WBPaper00040566 aggregatedWBPmids pmid22216122
compare_chebi.pl(2015 11 04)
The goal of this script is to import more IDs from ChEBI into the mop_tables. The import relies on mop entries matching chebi objects. So the first thing the script does is to compare chebi.obo object name with mop_publicname and mop_synonym, in a case insensitive way. chebi.obo comes from ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo
The script outputs three files:
- chebis_in_mop_vs_obo - compares chebi.obo and mop tables, finds molecules in mop tables that do not exist in the chebi.obo - based on name,
100241 not in mop 100246 not in mop 100461 not in mop 100921 not in chebi 101085 not in mop 101096 not in mop
- chebi_name_discrepancies_case_sensitive and *chebi_name_discrepancies
These files list the mop values with chebi IDs but where the public names or synonyms do not match - the first file forces case sensitivity so there are many instances where the names are the same, it's just a matter of case (467 lines). The second file is case insensitive, this was much shorter (73 lines). I went through these manually and verified the entities were the same, there are two confusions:
CHEBI:30829 petroselaidic acid (CasRN 593-40-8) MOP- petroselinic acid (CasRN 4712-34-9) CHEBI:53692 hypaque (CasRN 117-96-4) MOP histopaque (CasRN 737-31-5)
Once names are matched, the corresponding values will be imported into mop_tables as follows: mop_table chebi.obo entry example mop_formula synonym: "<formula>" RELATED FORMULA [ChEBI:] (changed) mop_iupac (table not created?) synonym: "<NAME>" EXACT IUPAC_NAME [IUPAC:] mop_smiles synonym: "<smiles>" RELATED SMILES [ChEBI:] mop_inchi synonym: "<InChI=>" RELATED InChI [ChEBI:] mop_inchikey synonym: "<inchikey>" RELATED InChIKey [ChEBI:] mop_kegg KEGG COMPOUND:C00742 "KEGG COMPOUND" (added)
Molecule curation
Bulk (batch) upload
Instructions for general batch upload here
Create spread sheet with column names that match molecule OA tables
Use http://www.metaboanalyst.ca/faces/home.xhtml to map ids - at least need Chebi IDs from KEGG
mop_paper mop_name mop_publicname mop_synonym mop_chemi mop_kegg mop_chebi mop_molformula mop_biorole mop_bioroletext mop_status mop_detctmethod mop_otherdetctmethod mop_extrctmethod mop_otherextrctmethod mop_chemicalsynthesis mop_nonbiosource mop_endogenousin
- grab WBMol IDs and place in mop_name column
- place WBPaperIDs in quotes as the OA field is multiontology
- use species code for mop_endogenousin , i.e., for Caenorhabditis elegans, enter 6239
- test upload on mangolassi with
./populate_oa_tab_file.pl mangolassi #### realfile.tsv where #### is WBPersonID
populate missing mop_table values from chebi.obo with
/home/postgres/work/pgpopulation/mop_molecule/20151104_chebi_again/populate_other_chebi.out make sure chebi.obo file is in the same directory.
Drug-phenotype curation
Molecules will be linked to genes based on their influence on gene activity altered by variation, overexpression, and RNAi-based knockdown.
Drug-gene interactions
Molecules will also be linked to genes through their influence on gene activity directly through gene regulation interactions.
--kjy 20:03, 16 December 2011 (UTC)
Molecule OA
Changes for WS252
- syncing with chebi.obo
on manoglassi /home/postgres/work/pgpopulation/mop_molecule/20151104_chebi_again/ chebis_in_mop_vs_obo - chebi IDs in one file but not the other chebi_name_discrepancies - among chebi IDs that exist in both sets, when no name or synonym match the other compare_chebi.pl - works off of chebi.obo, can run in own directory with updated chebi.obo (copy script to directory with new version of chebi.obo)
*pgpopulation script on mangolassi /home/postgres/work/pgpopulation/mop_molecule/20151104_chebi_again/populate_other_chebi.out -incorporates chebi values for the other tables (mop_kegg, mop_smiles, mop_inchi, mop_inchikey, mop_molformula, mop_iupac, note:not mop_exactmass) -should replace all the KEGG compound ids with what is in the chebi.obo if the KEGG compound id in mop_kegg is missing. If the KEGG compound ids are different in chebi.obo vs mop_kegg, replace the value with the chebi.obo value and report the discrepancy in the output. Output tells you if a chebiID does not exist. 'HAS MOP' are entries that have an obo and mop value same vs diff is whether what's in obo and mop is the same or different
mop_table chebi.obo entry example mop_molformula synonym: "<formula>" RELATED FORMULA [ChEBI:] (changed) mop_iupac (table not created?) synonym: "<NAME>" EXACT IUPAC_NAME [IUPAC:] mop_smiles synonym: "<smiles>" RELATED SMILES [ChEBI:] mop_inchi synonym: "<InChI=>" RELATED InChI [ChEBI:] mop_inchikey synonym: "<inchikey>" RELATED InChIKey [ChEBI:] mop_kegg KEGG COMPOUND:C00742 "KEGG COMPOUND" (added) mop_chemi xref: ChemIDplus:2140-79-6 "CAS Registry Number" (added 11/27/15)
Initial mop tables
For entering, editing, and storing information about molecule objects.
on tazendra or mangolassi:
~postgres/public_html/cgi-bin/oa/wormOA.pm
&initWormFields is like a switch to load a specific OA, each of them has a 3 letter code that corresponds to the postgres table name. so 'mop' is for the molecule oa and the 'mop_'
s in postgres mop then calls &initWormMopFields.currently around line 1200 there are sets of hashes like :
$fields{mop}{id}{type} = 'text'; $fields{mop}{id}{label} = 'pgid'; $fields{mop}{id}{tab} = 'all'; $fields{mop}{paper}{type} = 'multiontology'; $fields{mop}{paper}{label} = 'WBPaper'; $fields{mop}{paper}{tab} = 'all'; $fields{mop}{paper}{ontology_type} = 'WBPaper'; $fields{mop}{name}{type} = 'text'; $fields{mop}{name}{label} = 'Name'; $fields{mop}{name}{tab} = 'all';
most of these correspond to a postgres table. the first one is the
exception, which is the pgid. so excluding that in the example above,
there are 2 postgres tables, mop_paper and mop_name
- {type} is the type of data they hold
- text
- bigtext
- ontology
- dropdown
- toggle
- multiontology
- multidropdown
- {label} is the field name in the OA
- {tab} is which tab number it should show in. 'all' shows up in
all tabs, but is mostly used in OAs with no numbered tabs
- {ontology_type} if {type} is ontology / multiontology, ontology_type
refers to the type of values that go there for the autocomplete and validation.
- {ontology_table} if the type is ontology / multiontology
$fields{mop}{chebi}{type} = 'ontology'; $fields{mop}{chebi}{label} = 'ChEBI_ID'; $fields{mop}{chebi}{tab} = 'all'; $fields{mop}{chebi}{ontology_type} = 'obo'; $fields{mop}{chebi}{ontology_table} = 'chebi';
which corresponds to the obo_ tables that are updates via cronjob or populated just once. in this case mapping to obo_<name|data|syn>_chebi
- {dropdown_type} if the {type} is dropdown, which refers to the type of data like the {ontology_type}, but the values are hardcoded further down in the code instead of stored in postgres for querying.
mop_tables
WS254 Requested changes | WS253 Requested changes | Model tag | table | type | OA label | tab | ontology_type | ontology_table | cross table population script | comments and past change requests (WSVersion) |
pgonly | mop_id | not a table | pgid | 1 | - | - | - | WBMol: $molId = &pad8Zeros($newPgid) | ||
pgonly | mop_timestamp | not a table | none | - | - | - | - | |||
pgonly | mop_curator | dropdown | Curator | 1 | - | - | - | same values in all OA curation tables | ||
Name | mop_name | text | Name | 1 | - | - | - | WBMolID -added after tables were built | ||
Public_name | mop_publicname | bigtext | Public_name | all | - | - | - | |||
Synonym | mop_synonym | bigtext | Synonyms | 1 | - | - | - | |||
DB_info | mop_molecule | text | MeSH / CTD or default | 1 | - | - | - | use MeSh ID here originally used as NameWS252 - change field name to MeSH/CTD | ||
DB_info | mop_chemi | text | CasRN | 1 | - | - | - | |||
DB_info | mop_chebi | ontology | ChEBI_id | 1 | obo | chebi | downloaded from ChEBI server, updated monthly, ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo | |||
DB_info | mop_kegg | text | Kegg compound (Acc#) | 1 | - | - | - | |||
Remark | mop_remark | bigtext | Remark | 1 | - | - | - | |||
Reference | mop_paper | multiontology | WBPaper | all | WBPaper | pap_tables | - | |||
DB_info | mop_smmid | text | SMID-DB | 1 | - | - | - | |||
InChi | mop_inchi | text | InChi(standard) | 1 | - | - | import from chebi.obo, WS252-DUMP | |||
InChiKey | mop_inchikey | text | InChi key | 1 | - | - | import from chebi.obo, WS252-DUMP | |||
SMILES | mop_smiles | text | SMILES | 1 | - | - | ||||
use for calculating trp_exactmass | Formula | mop_molformula | text | Formula | 1 | - | - | import from chebi.obo, WS252-DUMP | ||
IUPAC | mop_iupac | text | IUPAC | 1 | - | - | import from chebi.obo, WS252-ADD and DUMP | |||
Calculate this value see section below | Monoisotopic_mass | mop_exactmass | read-only | Exact mass | 1 | - | - | figure out how to calculate based on mop_molformula, WS252-DUMP | ||
Molecule_use | mop_moleculeuse | bigtext | Molecule use | 2 | - | - | - | |||
Not in model | mop_gotarget | multiontology | GO target | 2 | obo | goid | - | WS252-DELETED | ||
Not in model | mop_genetarget | multiontology | Gene target | 2 | WBGene | gin_tables | - | WS252-DELETED | ||
mop_classified | text | Classified as | 2 | dropdown | - | - | WS252-DELETED | |||
mop_source | text | source | 2 | dropdown | - | - | WS252-DELETED | |||
mop_species | text | Species | 2 | - | - | WS252-DELETED | ||||
mop_role | dropdown | Role | 2 | - | - | WS252-DELETED | ||||
Biofunction_role | mop_biorole | dropdown | BioRole | 2 | dropdown values{Metabolite, Regulator, Structural_component, Cofactor, Activator, Inhibitor, Product, Substrate, Ligand, Receptor} | - | - | WS252-ADD and DUMP with evidence | ||
mop_bioroletext | text | BioRoleText | 2 | - | corresponds to ?Text in model lines like : Structural_component ?Text #Evidence under the Biofunction_role tag | WS252-ADD and DUMP in line with Biofunction_role tag and evidence | ||||
Essential_for | mop_essentialfor | ontology | EssentialForSpecies | 2 | species list as in dis_species? | - | - | WS252-ADD and DUMP with evidence | ||
Status | mop_status | dropdown | Status | 2 | dropdown{Detected, Predicted} | - | - | WS252-ADD and DUMP with evidence | ||
WS253 add to dropdown list HR-MAS-NMR GC-MS LC-MS LC-Coularray |
Detection_method | mop_detctmethod | dropdown | DetectionMethod | 2 | dropdown values{NMR, MALDI-MS, HPLC-UV, shotgun lipidomics} | - | - | WS252-ADD and DUMP with evidence | |
Detection_method | mop_otherdetctmethod | text | OtherDetectionMethod | 2 | - | - | WS252-ADD and DUMP with evidence | |||
WS253 add to dropdown list 80% MeOH , ACN /0.1M NaCl , EtOH ,M9 MeOH/CHCl3 MeOH/MTBE phosphate buffer mobile phase M9 buffer |
Extraction_method | mop_extrctmethod | dropdown | ExtractionMethod | 2 | dropdown values{MeOH, exometabolome, MeOH/Chloroform, 5% trichloroacetic acid} | - | - | WS252-ADD and DUMP with evidence | |
Extraction_method | mop_otherextrctmethod | text | OtherExtractionMethod | 2 | - | - | WS252-ADD and DUMP with evidence | |||
Chemical_synthesis | mop_chemicalsynthesis | toggle | Chemical synthesis | 2 | - | - | WS252-ADD and DUMP with evidence | |||
Nonspecies_source | mop_nonbiosource | text | NonSpeciesSource | 2 | - | - | WS252-ADD and DUMP | |||
Endogenous_in | mop_endogenousin | ontology | EndogenousSpecies | 2 | species list as in app_pathogen - obo tables for ncbitaxonid | - | - | WS252-ADD and DUMP with evidence | ||
WS253 Add and dump | mop_communitycurator | ontology | CommCurator | 2 | person ontology | - | - | |||
WS253 Add and dump | mop_communitycuratoremail | text | CommCuratorEmail | 2 | - | - | - | |||
Gene_regulation | grg_moleculeregulator | multiontology | Molecule Regulator | GENEREG tab3 | Molecule | mop_tables | - | |||
WBProcess | pro_molecule | multiontology | Molecule | PROCESS tab1 | Molecule | mop_tables | - | |||
Phenotype | app_molecule | multiontology | Molecule | PHENOTYPE tab2 | Molecule | mop_tables | - | |||
RNAi | rna_molecule | multiontology | Molecule | RNAi tab2 | Molecule | mop_ tables | - | |||
Interaction | int_moleculenondir int_moleculeone int_moleculetwo |
Molecule | Effected molecule Affected molecule Nondirectional molecule |
INTERACTION tab3 | Molecule | mop_tables | - | Molecule were initially curated through a/effected_other fields and stored as text in a field in tab4. |
- "all" can mean all tabs, but also is used for OA's that only have one tab.
Calculating monoistopic mass
Also called Exact mass
Calculate mass of entry in trp_molformula when it exists, examples include:
- CH3COOH
- H2SO4
- Mg(OH)2 -> this needs to be translated to Mg + O(2) + H(2)
Calculation is the sum of the isotopic masses of each occurrence of each element in the mol formula.
Ex: CH3COOH has 2C, 4H and 2O Masses are C = 12.000000 H = 1.007825 O = 15.994915 So the isotopic mass is 2(12.00000) + 4(1.007825) + 2(15.994915) = 60.021130 Enter 60.021130 into trp_exactmass
Masses are retrieved from http://www2.warwick.ac.uk/fac/sci/chemistry/chemintra/postgrad/taughtmasters/mscinfo/ch908/atomic_mass_abund.pdf
You need to use the value with the highest abundance for those elements that have multiple masses.
I can provide a better table for reference
Molecule list
Molecule names have changed to WBMoleculeIDs
Initially, we will be using MeSH UIDs, assigned by the NLM, as IDs for the molecules in our database. Due to the more comprehensive coverage of the NLM molecules, and the fact that it is more stably funded, this source was thought to be a good starting point for this project. The list we are starting with has been pared down and was created by the Comparative Toxicogenomic Database (CTD), which contains over 130,000 terms. For each term, this list contains a term name, CTD ID, MeSH UID, and where available CAS Registry Numbers. Using the CasRNs, we extracted the ChEBI ID from the Chemical Entities of Biological Interest database entity list, where it existed, along with any KEGG Compound accession number.
A sample molecule.ace record:
Molecule : "C009687" Public_name "wortmannin" Database "NLM_MeSH" "UID" "C009687" Database "CTD" "ChemicalID" "C009687" Database "ChemIDplus" "19545-26-7" Database "ChEBI" "CHEBI_ID" "52289" Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C15181"
To make a working list of reference molecules for the various curation efforts, we used Textpresso to scan for all terms on the list that have been published in the C. elegans corpus. The resulting list is less than 6000 terms. The terms that have been identified in the corpus and available
based on frequency in corpus and
concatenated into one file
This last file is being used as a starting file for molecule look-up by WB curators.
Caveats and notes:
- The list is now small enough that if we wanted to load it into WB at least we know that every term has some relevance to the literature (although unverified).
- The list is small enough to be amenable to editing through ontology editors like OBOedit (even though it is not an ontology).
- We do not have definitions of the terms, nor are the terms arranged in any hierarchical manner; however other databases do, and we provide links to those websites if an ID is available.
- Terms and synonyms of terms, will be added as needed, this curation effort still needs to be worked out, ideally the list will be incorporated as a selection list for whatever curation tool a curator is using.
Molecule Curation Pipeline
Molecule Upload
- Dumper - Molecule.ace made from /home/acedb/karen/WS_upload_scripts/Molecule/dump_molecule_ace.pl
- changes needed for WS252 - see table http://wiki.wormbase.org/index.php/Molecule#mop_tables
test.ace
#NOTE: use spaces instead of tabs Molecule : "WBMol:00001516" Public_name "triethanolamine" Synonym "TEA" Database "NLM_MeSH" "UID" "C009546" Database "CTD" "ChemicalID" "C009546" Database "ChEBI" "CHEBI_ID" "28621" Database "ChemIDplus" "RN" "102-71-6" Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C06771" Molecule_use "spermatid activator" Reference "WBPaper00040819" Molecule : "WBMol:00001853" Public_name "NAD" Monoisotopic_mass "664.116399" Formula "C21H28N7O14P2" IUPAC "adenosine 5'-{3-[1-(3-carbamoylpyridinio)-1,4-anhydro-D-ribitol-5-yl] dihydrogen diphosphate}" SMILES "[H][C@]1(COP(O)(=O)OP(O)(=O)OC[C@@]2([H])O[C@@]([H])([N+]3=CC=CC(=C3)C(O)=N)[C@]([H])(O)[C@]2([H])O)O[C@@]([H])(N2C=NC3=C(N)N=CN=C23)[C@]([H])(O)[C@]1([H])O" InChi "InChI=1S/C21H27N7O14P2/c22-17-12-19(25-7-24-17)28(8-26-12)21-16(32)14(30)11(41-21)6-39-44(36,37)42-43(34,35)38-5-10-13(29)15(31)20(40-10)27-3-1-2-9(4-27)18(23)33/h1-4,7-8,10-11,13-16,20-21,29-32H,5-6H2,(H5-,22,23,24,25,33,34,35,36,37)/p+1/t10-,11-,13-,14-,15-,16-,20-,21-/m1/s1" InChiKey "BAWFJGJZGIEFAR-NNYOXOHSSA-O" Database "NLM_MeSH" "UID" "D009243" Database "CTD" "ChemicalID" "D009243" Database "ChEBI" "CHEBI_ID" "13393" Database "ChemIDplus" "RN" "53-84-9" Metabolite Endogenous_in "Caenorhabitis elegans" Molecule : "WBMol:00004910" Public_name "resveratrol" Synonym "Resveratrol" Synonym "Trans-resveratrol" Synonym "3,4',5-Trihydroxystilbene" Synonym "3,5,4'-Trihydroxystilbene" Synonym "Resvida" IUPAC "5-[2-(4-hydroxyphenyl)ethenyl]benzene-1,3-diol" SMILES "[H]C(=C([H])c1cc(O)cc(O)c1)c1ccc(O)cc1" InChi "InChI=1S/C14H12O3/c15-12-5-3-10(4-6-12)1-2-11-7-13(16)9-14(17)8-11/h1-9,15-17H" InChiKey "LUKBXSAWLPMMSZ-UHFFFAOYSA-N" Database "NLM_MeSH" "UID" "C059514" Database "CTD" "ChemicalID" "C059514" Database "ChEBI" "CHEBI_ID" "27881" Database "ChemIDplus" "RN" "501-36-0" Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C03582" Molecule_use "antioxidant" Reference "WBPaper00026929" Nonspecies_source "Human Food - Grapes"
- Database info for the chemical/small molecule database links needs to be manually edited on github in the external_urls file whenever there are changes to the databases associated with molecule data.
OBSOLETE: During each upload a molecule.ace file will be made in citace by Wen. This file will contain all the molecule cross references from within the RNAi and Variation Phenotype curation, merging them with the molecule data from the molecule list.
Papers flagged for chemical/molecule curation
WBPaperID | Comment |
461 | tetramisole |
464 | Yes |
484 | Yes |
493 | enhanced sensitivity of flu-2 mutants to EMS |
536 | pharmacological analysis of cell function (spermatozoan motility) not gene function |
1001 | Vanadate, AMP-PNP, ATP-gamma-S. NEM, Triton X-100, Taxol, analysis on unknown motor protein |
1010 | |
1524 | forskolin, NaF, AlCl3, GTPgammaS, GppNHp, GDPbetaS, pertussis toxin, cholera toxin |
2029 | campthothecin, berenil, trypanocidal drug, magnesium ion on DNA relaxation, and isolated topoisomerase, no gene product mentioned |
2116 | |
3137 | sodium arsenite, mercuric choride |
3150 | ouabain |
13430 | hematin, benzimidazole, Albendazole Diethylcarbamazine Fenbendazole Hematin Imidazole Ivermectin Levamisole Mebendazole Methimazole Morantel tartrate Oxibendazole Piperazine Pyrantel tartrate Thiabendazole: only tested in vitro on H. contortus GST activity. |
24940 | aldicarb |
24950 | diacetyl |
25017 | fig 4 |
25173 | Fig 5, Fig 7. |
28357 | yes |
28562 | octopamine |
28879 | serotonin |
28900 | fig.3 |
29114 | yes |
29130 | fig.3, fig.4 |
30726 | isoamyl alcohol |
30928 | fig.2 d) Exogenous serotonin and fluoxetine suppress 100G-induced DAF- 16TGFP nuclear accumulation. |
31225 | yes |
31321 | Figure 3. High NaCl accelerates aging of C. elegans |
31336 | nitrogen mustard |
31342 | yes |
31419 | yes |
31424 | fig.7 |
31427 | vitamin E |
31428 | yes |
31456 | tunicamycin |
31464 | yes |
31468 | fig.3 |
31474 | yes |
31482 | FUDR |
31483 | mPyrazine |
31490 | yes |
31509 | yes |
31530 | yes |
31535 | yes |
31571 | yes |
31593 | yes |
31604 | nicotine |
31626 | yes |
31627 | yes |
31644 | yes |
31657 | yes |
31667 | yes |
31669 | ethanol |
31672 | yes |
31682 | yes |
31683 | yes |
31690 | yes |
31694 | yes |
31703 | haem |
31810 | yes |
31824 | yes |
31834 | yes |
31850 | fluorathene |
31857 | yes |
31866 | yes |
31871 | DHP |
31872 | aldicarb |
31873 | arsenite |
31882 | 8-Br-cGMP |
31895 | Figure 4. Responses of wild-type (N2) and slo-1 mutant worms to aldicarb and levamisole |
31897 | fig.4 |
31915 | yes |
31924 | yes |
31939 | table 2 |
31941 | paraquat |
31959 | yes |
31977 | imipramine |
31982 | glutamate |
31991 | Flavone (2-phenyl chromone) |
31992 | NaCl |
31994 | arsenite |
31996 | fig.1 |
31999 | diltiazem |
32000 | arecoline |
32007 | levamisole |
32008 | aldicarb |
32011 | aldicarb |
32031 | NaCl |
32033 | yes |
32035 | yes |
32050 | yes |
32072 | amiloride |
32077 | C17ISO |
32079 | Dextran-HCC |
32093 | Quercetin |
32101 | cadmium, chlorpyrifos, nickel, prochloraz, diuron |
32103 | yes |
32125 | imidacloprid, thiacloprid |
32131 | yes |
32142 | phosphine, FCCP (carbonyl cyanide 4-(trifluoromethoxy)phenylhydrazone), PCP (2,3,4,5,6pentachlorophenol), DNP (2,4-dinitrophenol), sodium azide |
32143 | yes |
32181 | Table S2. Activity of the ascarosides in Daf-c and Daf-d strain backgrounds in the dauer formation assay |
32192 | tunicamycin |
32207 | 1NA-PP1 |
32215 | butanone, benzaldehyde |
32232 | paraquat |
32237 | paraquat |
32241 | paraquat |
32243 | ceramide |
32252 | aldicarb |
32255 | H2O2 |
32259 | 3-methyladenine |
32266 | metaboilte DA dafachronic acid, steroids |
32271 | ethanol aldicarb |
32286 | root exudate |
32295 | Ethosuximide |
32319 | NaCl and isoamyl alcohol (chemoattractants). |
32335 | serotonin |
32336 | benzaldehyde |
32358 | aldicarb |
32359 | ivermectin |
32366 | levamisole, nicotine |
32390 | flavonoid quercetin |
32427 | volatile anesthetic, halothane |
32429 | 1-octanol |
32470 | Species specificity of dauer pheromone extracts between elegans and Pristionchus. Response of Pristionchus and Strongyloides to different dafachronic acids. |
32475 | mianserin methiothepin |
32478 | levamisole |
32491 | amino acids |
32494 | AgNO3, CdCl2,CrCl2, CoCl2, CuSO4, HgCl2, MnCl2, NiSO4, Pb(NO3)2,and ZnCl2 |
32508 | chlorpyrifos oxon; cadmium chloride); hexachlorophene; neurotoxicants; chlorpyrifos, methyl mercury, chlordiazepoxide, tebuconazol; Cocaine; metals, ethanol, solvents, organophosphate; carbamate pesticides |
32517 | Fig 1.The structures of the dauer pheromone components, ascaroside C6 (1), ascaroside C9 (2), ascaroside C3 (3), and the less active ascaroside C7 (4). Fig 2. The chemical structures of the long-chain ascarosides from conditioned medium extracts from long-term dhs-28 cultures. The structural assignment of long-chain ascaroside 17 is tentative |
32522 | Sulfonamide CA inhibitors (CAIs) such as acetazolamide AZA, methazolamide MZA or ethoxzolamide EZA, many new sulfonamides (such as GUZ |
32878 | NaN3, benzaldehyde |
32880 | hydrogen peroxide |
32881 | 5-HT; Octanol, DiD |
32884 | paraquat, juglone |
32886 | dafachronic acids; Supp. New compounds reported in this study are: III, S-III, S-V, S-XV, S-XVI, S-XXII, XXIII, S-XXIII, XXIV, XXVII. |
32887 | |
32888 | dichlorvos,an organophosphorus insecticide, acetylcholinesterase, cadmium chloride |
32889 | sulfonamides; CAIs; 2-(hydrazinocarbonyl)-3-substituted-phenyl-1H-indole-5-sulfonamides possessing various 2-,3- or 4-substituted phenyl groups with methyl-, halogeno- and methoxy-functionalities, as well as the perfluorophenyl moiety; AZA and EZA |
32901 | CO2 |
32903 | paraquat |
32918 | Thus, (25R)-D7-dafachronic acid (2a) is one order of magnitude more active than (25R)-D4-dafachronic acid (1a). Similar to a previous study,5 little or no activity was detected with (25R)-cholestenoic acid (3a) |
32925 | cycloheximide |
32932 | resveratrol peptone |
32949 | Paraquat, tunicamycin |
32956 | These chemicals used on Pratylenchus penetrans: Acetic acid Propionic acid Isobutyric acid n-Butyric acid Isovaleric acid n-Valeric acid n-Caproic acid These chemicals used on C. elegans Acetic acid n-Caproic acid |
32958 | Fe-exposure |
32968 | tetramisole |
32989 | The levels of superoxide radical (.0^2-), in both mitochondria and cytosol, are increased in sod-1(tm776) and sod-1(tm783) mutants. |
32997 | dauer pheromone |
33002 | aldicarb levamisole |
33004 | electrically evoked and light evoked pharyngeal cholinergic post synaptic potentials reduced in amplitude by nicotinic antagonists benzoquinonium chloride and d-tubocurarine. Effect dose dependent. |
33009 | checked |
33024 | chloramine-T (CHT), DTT, H^20^2: tested for effects on both KVS-1 activity in vitro and on chemotaxis in vivo. |
33037 | dauer pheromone; 8-bromo-cGMP |
33040 | paraquat, excess O^2 |
33049 | Carbamate, Aldicarb, Carbofuran, Oxamyl, Neostigmine, Eserine, Organophosphates, Fenomiphos, Ethoprop, Parathion, Paraoxon, Phorate, Terbufos, meta-chlorperbenzoic acid, m-CPBA, diotioate |
33051 | identification of new daf-22 dependent dauer pheromone and mating pheromones ascr#7, ascr#8, and ascr#6.1 |
33060 | green tea polyphenol epigallocatechin gallate (EGCG) - |
33077 | flavopiridol olomoucine II |
33086 | hyperosmotic, sodium chloride, anoxia |
33094 | PMA induces the pnlp-29::gfp transgene |
33099 | dichlorvos fenamiphos organophosphates organophosphorous pesticides, neurotoxicant, mefloquine, |
33115 | low oxygen, lactacystn |
33126 | tunicamycin, glucose, deoxyglucose, sorbitol, glucose analog |
33130 | D-ribose affect larval growth |
33158 | Yes. Response to Diacetyl. |
33162 | The uncoupler CCCP (carbonylcyanide- 3-chlorophenylhydrazone) extends lifespan. |
33166 | dietary zinc |
33168 | Antipsychotic drugs, cyclosporin A |
33189 | Iron, PQS, PQS+Fe3 |
33433 | copper |
33441 | Pb, Hg, Cd, and Cr metals CdCl2, CrCl2, HgCl2, and Pb(NO3)2 in solution were used here: 2.5M, 50M, and 100M. |
33448 | Catechin Hydrogen peroxide catechin hydrate |
33456 | 8-bromo cGMP, paraquat, vinpocetine, zaprinast, EHNA (erythro-9-[2-hydroxy-3-nonyl] adenine) |
34686 | CdCl2, CrCl2, HgCl2, Pb(NO3)2 |
34687 | Screened ~54,000 chemicals from various libraries: Bioactives, natural product extracts, Analyticon purified natural product compounds, Diversity-oriented synthesis, ChemBridge kinases, ChemDiv, TimTec, MayBridge, ChemBridge. note: screened on glp-4(bn2);sek-1(km4) mutant worms Anti-infective hits that cure C. elegans of an E. faecalis infection at a concentration lower than the in vitro MIC with E. faecalis are grouped into 6 structural classes (representative structures shown). |
34688 | Bis-[4-methoxy-3- [3-(4-fluorophenyl)-6-(4-methylphenyl)-2(aryl)-tetrahydro-2Hpyrazolo[ 3,4-d]thiazol-5-yl]phenyl]methanes nematicidal activity |
34706 | Exposure to examined metals caused severe lethality toxicities in L1- and L2-larvae... |
34717 | fig 1, 40um juglone caused a significant increase in lifespan |
34757 | juglone, paraquat |
34758 | Rotenone, paraquat |
34766 | Cry5B toxin |
35074 | tribendimidine, levamisole, pyrantel |
35082 | drugs: muscimol and serotonin |
35083 | myxothiazol, FCCP |
35098 | Rib1P, 2-deoxy-a-d-ribose 1-phosphate (dRib1P), uridine, UMP, UDP, UTP, ATP, 2-deoxyuridine, 5-FU, 5dFUR, orotidine 5¢-phosphate, PPRP and 5-FU |
35114 | Pcm-1 mutant dauer larvae exposed to juglone develop into adults with a defect in egg-laying (Egl). Pcm-1 mutant eggs exposed to the oxidizing agents paraquat, homocysteine, and homocysteine thiolactone undergo a developmental delay more pronounced than that observed in wild-type animals. Pcm-1 mutant eggs exposed to homocysteine develop in to adults with a defect in egg-laying. |
35522 | 2-nonanone |