Molecule

From WormBaseWiki
Jump to navigationJump to search

links to relevant pages
Caltech documentation
Example molecule pages


Molecule Curation

Molecule curation captures chemical and drug entities that have been shown to effect the biology of the worm. We provide links to other databases that deal with these molecule entities in greater detail.

  • What we mean by small molecule
    • drug
    • metabolite (primary and secondary)
    • monomers or very small oligomers of nucleic acids, proteins, and polysaccharides
    • "Large collections of small molecules (molecular weight about 600 or less), of similar or diverse nature which are used for high-throughput screening analysis of the gene function, protein interaction, cellular processing, biochemical pathways, or other chemical interactions." (from nlm.nih.gov and wikipedia)

Model changes

Changes for WS250/1

?Molecule	Name	?Text
	Public_name	?Text
	Formula
	Monoisotopic_mass
	IUPAC
	SMILES
	InChi
	Synonym	?Text
	DB_info	Database  ?Database	?Database_field	?Text
	Status 	Detected  #Evidence
		Predicted #Evidence
        Quantity   UNIQUE  Float UNIQUE  Float     #Evidence //for measured amount in animal/or that produced an effect 
	Range	UNIQUE	Float UNIQUE	Float	#Evidence //for measured range in animal or that produced an effect 
	Origin  Plant     //from HMDB
                Microbial
                Cosmetic
                Toxin/Pollutant
                Food (Human)
                Drug
                Exogenous
                Endogenous   ?Species #Evidence
                Drug_metabolite
        Biofunction //from HMDB       
	##Source	Endogenous ?Species #Evidence
		##Exogenous
		##Pharmaceutical
	Receptor ?Gene
		?Molecule
		?Protein
	Pathway Database	?Database	?Database_field	?Text  //for importing KEGG pathways
	Role	Essential
		Nonessential
		Metabolite
		Regulatory
		Structural
	Regulate_expr_cluster	?Expression_cluster	XREF	Regulated_by_molecule
	WBProcess	?WBProcess	XREF	Molecule
	Affects_phenotype_of	Variation	?Variation	?Phenotype	#Evidence
		Strain	?Strain	?Phenotype	#Evidence
		Transgene	?Transgene	?Phenotype	#Evidence
		RNAi	?RNAi	?Phenotype	#Evidence
		Rearrangement	?Rearrangement	?Phenotype	#Evidence
		Interaction	?Interaction	XREF	Molecule_interactor
	Molecule_use	?Text	#Evidence
	Reference	?Paper	XREF	Molecule
	Remark	?Text	#Evidence

Comments on proposed changes

Please add your thoughts here!

Approved model

 ///////////////////////////small molecule/chemical/drug ////////////////////////////
 // 
 // ?Molecule
 //  * metabolites: precursors, intermediates, or end products of a metabolic pathway
 //  * monomeric or very small oligomeric nucleic acids (not RNAi primers), e.g. ATP, ADP, cAMP, GTP, trinucleotide repeats??
 //  * chemicals/drugs
 //  * minerals, ions, salts
 //
 ////////////////////////////////////////////////////////////////////////////////////
 
 ?Molecule  Name ?Text
           Public_name ?Text
           Synonym ?Text
           DB_info Database ?Database ?Database_field Text
           Gene_regulation Gene_regulator ?Gene_regulation XREF Molecule_regulator
           Regulate_expr_cluster ?Expression_cluster XREF Regulated_by_molecule //Wen WS228
	   WBProcess ?WBProcess XREF Molecule
           Affects_phenotype_of     Variation ?Variation ?Phenotype #Evidence
                                    Strain    ?Strain    ?Phenotype #Evidence
                                    Transgene ?Transgene ?Phenotype #Evidence
                                    RNAi      ?RNAi      ?Phenotype #Evidence
                                    Rearrangement  ?Rearrangement  ?Phenotype  #Evidence  //KY [110602 pad]
	   Interaction	?Interaction	XREF	Molecule_regulator
	   Molecule_use  ?Text  #Evidence
	   Reference ?Paper XREF Molecule
	   Remark ?Text #Evidence

 ///////////////////////////////////////////////////////////////////////////////////

Corresponding changes in touched models

/////
?Phenotype_info    Affected_by  Molecule  ?Molecule    #Evidence
/////
/////
?Gene_regulation  Regulator Molecule_regulator   ?Molecule  XREF  Gene_regulator  #Boolean 
/////

Model elements

  • Name-> WBMol:ID
    • originally, this field contained the MeSH ID, when no MeSH ID was available, a WBMol ID was assigned, which was supposed to be replaced by the MeSH ID when available. Unfortunately, Names are scripted to be used in URL constructions for MeSH and CTD database links resulting in WBMol_id name based URLs would have to be suppressed or, result in bad URLs. Also, the increased use of Molecule annotation in other curation pipelines resulted in confusion with object creation. These points of confusion prompted a move to change the Name field change from MeSH default IDs to WBMol IDs for all objects.
  • Public name -> common name in elegans literature
  • Synonym -> other names used in papers, case sensitivity. Many other names can be taken in from CTD pages. Other names need to be pipe separated
  • DB_info -> links to entity in other database. Database URLs need to be added to github external URL file.

Molecule curation

Drug-phenotype curation

Molecules will be linked to genes based on their influence on gene activity altered by variation, overexpression, and RNAi-based knockdown.

Drug-gene interactions

Molecules will also be linked to genes through their influence on gene activity directly through gene regulation interactions.

Molecule databases

Molecule IDs will be provided, when available, for the following databases:

  • Database "NLM_MeSH" "UID"
  • Database "CTD" "ChemicalID"
  • Database "ChemIDplus" using the CasRN
  • Database "ChEBI" "CHEBI_ID"
  • Database "KEGG COMPOUND" "ACCESSION_NUMBER"
  • Database "SMID-DB"

--kjy 20:03, 16 December 2011 (UTC)

syncing with ChEBI

--kjy (talk) 20:21, 30 April 2014 (UTC) Script identifies any molecule in the mop tables with no mop_chebi id but has been used in curation through paper tables of either mop, app, grg, pro, or rna tables.

/home/acedb/karen/molecule/chebi_missing/find_missing_chebi_in_other_oa.pl

checks molecule entries without chebi that have data for any of :
app_molecule
grg_moleculeregulator
pro_molecule
rna_molecule
mop_paper

print out mop_publicname mop_molecule mop_chemi mop_paper mop_smmid
as well as corresponding papers from other OAs.  aggregate all papers
for a given molecule objects, convert to PMID if pmid exists.

Output at file 'out'

output looks like this:
 
WBMol:00000139	Oxamniquine
mop	139
mop_chemi	21738-42-1
mop_molecule	D010073
mop_paper	"WBPaper00038322","WBPaper00031181"
aggregatedWBPapers	WBPaper00031181, WBPaper00038322
aggregatedWBPmids	pmid17988075, pmid21500322

WBMol:00000155	icariin
mop	155
mop_chemi	489-32-7
mop_molecule	C056599
mop_paper	"WBPaper00040566"
aggregatedWBPapers	WBPaper00040566
aggregatedWBPmids	pmid22216122

ChEBI terms and IDs are pulled from the chebi.obo on the ChEBI ftp server. This file is updated monthly (first Monday of the month except in the case of a Bank Holiday when it becomes the Tuesday). The ChEBI website is updated daily.

Molecule OA

For entering, editing, and storing information about molecule objects.

on tazendra or mangolassi:
~postgres/public_html/cgi-bin/oa/wormOA.pm

&initWormFields is like a switch to load a specific OA, each of them has a 3 letter code that corresponds to the postgres table name. so 'mop' is for the molecule oa and the 'mop_'

s in postgres mop then calls &initWormMopFields.
currently around line 1200 there are sets of hashes like :
 $fields{mop}{id}{type}                             = 'text';
 $fields{mop}{id}{label}                            = 'pgid';
 $fields{mop}{id}{tab}                              = 'all';
 $fields{mop}{paper}{type}                          = 'multiontology';
 $fields{mop}{paper}{label}                         = 'WBPaper';
 $fields{mop}{paper}{tab}                           = 'all';
 $fields{mop}{paper}{ontology_type}                 = 'WBPaper';
 $fields{mop}{name}{type}                           = 'text';
 $fields{mop}{name}{label}                          = 'Name';
 $fields{mop}{name}{tab}                            = 'all';

most of these correspond to a postgres table. the first one is the exception, which is the pgid. so excluding that in the example above, there are 2 postgres tables, mop_paper and mop_name

  • {type} is the type of data they hold
    • text
    • bigtext
    • ontology
    • dropdown
    • toggle
    • multiontology
    • multidropdown
  • {label} is the field name in the OA
  • {tab} is which tab number it should show in. 'all' shows up in

all tabs, but is mostly used in OAs with no numbered tabs

  • {ontology_type} if {type} is ontology / multiontology, ontology_type

refers to the type of values that go there for the autocomplete and validation.

  • {ontology_table} if the type is ontology / multiontology
 $fields{mop}{chebi}{type}                          = 'ontology';
 $fields{mop}{chebi}{label}                         = 'ChEBI_ID';
 $fields{mop}{chebi}{tab}                           = 'all';
 $fields{mop}{chebi}{ontology_type}                 = 'obo';
 $fields{mop}{chebi}{ontology_table}                = 'chebi';

which corresponds to the obo_ tables that are updates via cronjob or populated just once. in this case mapping to obo_<name|data|syn>_chebi

  • {dropdown_type} if the {type} is dropdown, which refers to the type of data like the {ontology_type}, but the values are hardcoded further down in the code instead of stored in postgres for querying.

mop_tables

postgres tables for the Molecule class
dumper on tazendra at karen/Molecule/dump_molecule_ace.pl
Model tag table type OA label tab ontology_type ontology_table cross table population script comment
pgonly mop_id not a table pgid all* - - - WBMol: $molId = &pad8Zeros($newPgid)
pgonly mop_timestamp not a table none - - - -
pgonly mop_curator dropdown Curator all - - - same values in all OA curation tables
Name mop_name text Name all - - - WBMolID -added after tables were built
Public_name mop_publicname bigtext Public_name all - - -
Synonym mop_synonym bigtext Synonyms all - - -
DB_info mop_molecule text MeSH / CTD or default all - - - use MeSh ID here
originally used as Name
DB_info mop_chemi text CasRN all - - -
DB_info mop_chebi ontology ChEBI_id all obo chebi downloaded from ChEBI server, which is updated monthly, ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/chebi.obo
DB_info mop_kegg text Kegg compound (Acc#) all - - -
Remark mop_remark bigtext Remark all - - -
Reference mop_paper multiontology WBPaper all WBPaper pap_tables -
Molecule_use mop_moleculeuse bigtext Molecule use all - - -
Not in model mop_gotarget multiontology GO target all obo goid -
Not in model mop_genetarget multiontology Gene target all WBGene gin_tables -
DB_info mop_smmid text SMID-DB all - - -
Gene_regulation grg_moleculeregulator multiontology Molecule Regulator GENEREG tab3 Molecule mop_tables -
WBProcess pro_molecule multiontology Molecule PROCESS tab1 Molecule mop_tables -
Phenotype app_molecule multiontology Molecule PHENOTYPE tab2 Molecule mop_tables -
RNAi rna_molecule multiontology Molecule RNAi tab2 Molecule mop_ tables -
Interaction int_moleculenondir
int_moleculeone
int_moleculetwo
Molecule Effected molecule
Affected molecule
Nondirectional molecule
INTERACTION tab3 Molecule mop_tables - Molecule were initially curated through a/effected_other fields and stored as text in a field in tab4.
  • "all" can mean all tabs, but also is used for OA's that only have one tab.

Molecule list

Molecule names have changed to WBMoleculeIDs
Initially, we will be using MeSH UIDs, assigned by the NLM, as IDs for the molecules in our database. Due to the more comprehensive coverage of the NLM molecules, and the fact that it is more stably funded, this source was thought to be a good starting point for this project. The list we are starting with has been pared down and was created by the Comparative Toxicogenomic Database (CTD), which contains over 130,000 terms. For each term, this list contains a term name, CTD ID, MeSH UID, and where available CAS Registry Numbers. Using the CasRNs, we extracted the ChEBI ID from the Chemical Entities of Biological Interest database entity list, where it existed, along with any KEGG Compound accession number.

A sample molecule.ace record:

Molecule : "C009687"
Public_name "wortmannin"
Database "NLM_MeSH" "UID" "C009687"
Database "CTD"  "ChemicalID" "C009687"
Database "ChemIDplus"  "19545-26-7"
Database "ChEBI" "CHEBI_ID" "52289"
Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C15181"

To make a working list of reference molecules for the various curation efforts, we used Textpresso to scan for all terms on the list that have been published in the C. elegans corpus. The resulting list is less than 6000 terms. The terms that have been identified in the corpus and available based on frequency in corpus and concatenated into one file
This last file is being used as a starting file for molecule look-up by WB curators.
Caveats and notes:

  • The list is now small enough that if we wanted to load it into WB at least we know that every term has some relevance to the literature (although unverified).
  • The list is small enough to be amenable to editing through ontology editors like OBOedit (even though it is not an ontology).
  • We do not have definitions of the terms, nor are the terms arranged in any hierarchical manner; however other databases do, and we provide links to those websites if an ID is available.
  • Terms and synonyms of terms, will be added as needed, this curation effort still needs to be worked out, ideally the list will be incorporated as a selection list for whatever curation tool a curator is using.

Molecule Curation Pipeline

Molecule upload

  • Molecule.ace made from karen/Molecule/dump_molecule_ace.pl
  • Database info for the chemical/small molecule database links needs to be manually edited on github in the external_urls file whenever there are changes to the databases associated with molecule data.

During each upload a molecule.ace file will be made in citace by Wen. This file will contain all the molecule cross references from within the RNAi and Variation Phenotype curation, merging them with the molecule data from the molecule list.

Papers flagged for chemical/molecule curation

WBPaperID Comment
461 tetramisole
464 Yes
484 Yes
493 enhanced sensitivity of flu-2 mutants to EMS
536 pharmacological analysis of cell function (spermatozoan motility) not gene function
1001 Vanadate, AMP-PNP, ATP-gamma-S. NEM, Triton X-100, Taxol, analysis on unknown motor protein
1010
1524 forskolin, NaF, AlCl3, GTPgammaS, GppNHp, GDPbetaS, pertussis toxin, cholera toxin
2029 campthothecin, berenil, trypanocidal drug, magnesium ion on DNA relaxation, and isolated topoisomerase, no gene product mentioned
2116
3137 sodium arsenite, mercuric choride
3150 ouabain
13430 hematin, benzimidazole, Albendazole Diethylcarbamazine Fenbendazole Hematin Imidazole Ivermectin Levamisole Mebendazole Methimazole Morantel tartrate Oxibendazole Piperazine Pyrantel tartrate Thiabendazole: only tested in vitro on H. contortus GST activity.
24940 aldicarb
24950 diacetyl
25017 fig 4
25173 Fig 5, Fig 7.
28357 yes
28562 octopamine
28879 serotonin
28900 fig.3
29114 yes
29130 fig.3, fig.4
30726 isoamyl alcohol
30928 fig.2 d) Exogenous serotonin and fluoxetine suppress 100G-induced DAF- 16TGFP nuclear accumulation.
31225 yes
31321 Figure 3. High NaCl accelerates aging of C. elegans
31336 nitrogen mustard
31342 yes
31419 yes
31424 fig.7
31427 vitamin E
31428 yes
31456 tunicamycin
31464 yes
31468 fig.3
31474 yes
31482 FUDR
31483 mPyrazine
31490 yes
31509 yes
31530 yes
31535 yes
31571 yes
31593 yes
31604 nicotine
31626 yes
31627 yes
31644 yes
31657 yes
31667 yes
31669 ethanol
31672 yes
31682 yes
31683 yes
31690 yes
31694 yes
31703 haem
31810 yes
31824 yes
31834 yes
31850 fluorathene
31857 yes
31866 yes
31871 DHP
31872 aldicarb
31873 arsenite
31882 8-Br-cGMP
31895 Figure 4. Responses of wild-type (N2) and slo-1 mutant worms to aldicarb and levamisole
31897 fig.4
31915 yes
31924 yes
31939 table 2
31941 paraquat
31959 yes
31977 imipramine
31982 glutamate
31991 Flavone (2-phenyl chromone)
31992 NaCl
31994 arsenite
31996 fig.1
31999 diltiazem
32000 arecoline
32007 levamisole
32008 aldicarb
32011 aldicarb
32031 NaCl
32033 yes
32035 yes
32050 yes
32072 amiloride
32077 C17ISO
32079 Dextran-HCC
32093 Quercetin
32101 cadmium, chlorpyrifos, nickel, prochloraz, diuron
32103 yes
32125 imidacloprid, thiacloprid
32131 yes
32142 phosphine, FCCP (carbonyl cyanide 4-(trifluoromethoxy)phenylhydrazone), PCP (2,3,4,5,6pentachlorophenol), DNP (2,4-dinitrophenol), sodium azide
32143 yes
32181 Table S2. Activity of the ascarosides in Daf-c and Daf-d strain backgrounds in the dauer formation assay
32192 tunicamycin
32207 1NA-PP1
32215 butanone, benzaldehyde
32232 paraquat
32237 paraquat
32241 paraquat
32243 ceramide
32252 aldicarb
32255 H2O2
32259 3-methyladenine
32266 metaboilte DA dafachronic acid, steroids
32271 ethanol aldicarb
32286 root exudate
32295 Ethosuximide
32319 NaCl and isoamyl alcohol (chemoattractants).
32335 serotonin
32336 benzaldehyde
32358 aldicarb
32359 ivermectin
32366 levamisole, nicotine
32390 flavonoid quercetin
32427 volatile anesthetic, halothane
32429 1-octanol
32470 Species specificity of dauer pheromone extracts between elegans and Pristionchus. Response of Pristionchus and Strongyloides to different dafachronic acids.
32475 mianserin methiothepin
32478 levamisole
32491 amino acids
32494 AgNO3, CdCl2,CrCl2, CoCl2, CuSO4, HgCl2, MnCl2, NiSO4, Pb(NO3)2,and ZnCl2
32508 chlorpyrifos oxon; cadmium chloride); hexachlorophene; neurotoxicants; chlorpyrifos, methyl mercury, chlordiazepoxide, tebuconazol; Cocaine; metals, ethanol, solvents, organophosphate; carbamate pesticides
32517 Fig 1.The structures of the dauer pheromone components, ascaroside C6 (1), ascaroside C9 (2), ascaroside C3 (3), and the less active ascaroside C7 (4). Fig 2. The chemical structures of the long-chain ascarosides from conditioned medium extracts from long-term dhs-28 cultures. The structural assignment of long-chain ascaroside 17 is tentative
32522 Sulfonamide CA inhibitors (CAIs) such as acetazolamide AZA, methazolamide MZA or ethoxzolamide EZA, many new sulfonamides (such as GUZ
32878 NaN3, benzaldehyde
32880 hydrogen peroxide
32881 5-HT; Octanol, DiD
32884 paraquat, juglone
32886 dafachronic acids; Supp. New compounds reported in this study are: III, S-III, S-V, S-XV, S-XVI, S-XXII, XXIII, S-XXIII, XXIV, XXVII.
32887
32888 dichlorvos,an organophosphorus insecticide, acetylcholinesterase, cadmium chloride
32889 sulfonamides; CAIs; 2-(hydrazinocarbonyl)-3-substituted-phenyl-1H-indole-5-sulfonamides possessing various 2-,3- or 4-substituted phenyl groups with methyl-, halogeno- and methoxy-functionalities, as well as the perfluorophenyl moiety; AZA and EZA
32901 CO2
32903 paraquat
32918 Thus, (25R)-D7-dafachronic acid (2a) is one order of magnitude more active than (25R)-D4-dafachronic acid (1a). Similar to a previous study,5 little or no activity was detected with (25R)-cholestenoic acid (3a)
32925 cycloheximide
32932 resveratrol peptone
32949 Paraquat, tunicamycin
32956 These chemicals used on Pratylenchus penetrans: Acetic acid Propionic acid Isobutyric acid n-Butyric acid Isovaleric acid n-Valeric acid n-Caproic acid These chemicals used on C. elegans Acetic acid n-Caproic acid
32958 Fe-exposure
32968 tetramisole
32989 The levels of superoxide radical (.0^2-), in both mitochondria and cytosol, are increased in sod-1(tm776) and sod-1(tm783) mutants.
32997 dauer pheromone
33002 aldicarb levamisole
33004 electrically evoked and light evoked pharyngeal cholinergic post synaptic potentials reduced in amplitude by nicotinic antagonists benzoquinonium chloride and d-tubocurarine. Effect dose dependent.
33009 checked
33024 chloramine-T (CHT), DTT, H^20^2: tested for effects on both KVS-1 activity in vitro and on chemotaxis in vivo.
33037 dauer pheromone; 8-bromo-cGMP
33040 paraquat, excess O^2
33049 Carbamate, Aldicarb, Carbofuran, Oxamyl, Neostigmine, Eserine, Organophosphates, Fenomiphos, Ethoprop, Parathion, Paraoxon, Phorate, Terbufos, meta-chlorperbenzoic acid, m-CPBA, diotioate
33051 identification of new daf-22 dependent dauer pheromone and mating pheromones ascr#7, ascr#8, and ascr#6.1
33060 green tea polyphenol epigallocatechin gallate (EGCG) -
33077 flavopiridol olomoucine II
33086 hyperosmotic, sodium chloride, anoxia
33094 PMA induces the pnlp-29::gfp transgene
33099 dichlorvos fenamiphos organophosphates organophosphorous pesticides, neurotoxicant, mefloquine,
33115 low oxygen, lactacystn
33126 tunicamycin, glucose, deoxyglucose, sorbitol, glucose analog
33130 D-ribose affect larval growth
33158 Yes. Response to Diacetyl.
33162 The uncoupler CCCP (carbonylcyanide- 3-chlorophenylhydrazone) extends lifespan.
33166 dietary zinc
33168 Antipsychotic drugs, cyclosporin A
33189 Iron, PQS, PQS+Fe3
33433 copper
33441 Pb, Hg, Cd, and Cr metals CdCl2, CrCl2, HgCl2, and Pb(NO3)2 in solution were used here: 2.5M, 50M, and 100M.
33448 Catechin Hydrogen peroxide catechin hydrate
33456 8-bromo cGMP, paraquat, vinpocetine, zaprinast, EHNA (erythro-9-[2-hydroxy-3-nonyl] adenine)
34686 CdCl2, CrCl2, HgCl2, Pb(NO3)2
34687 Screened ~54,000 chemicals from various libraries: Bioactives, natural product extracts, Analyticon purified natural product compounds, Diversity-oriented synthesis, ChemBridge kinases, ChemDiv, TimTec, MayBridge, ChemBridge. note: screened on glp-4(bn2);sek-1(km4) mutant worms Anti-infective hits that cure C. elegans of an E. faecalis infection at a concentration lower than the in vitro MIC with E. faecalis are grouped into 6 structural classes (representative structures shown).
34688 Bis-[4-methoxy-3- [3-(4-fluorophenyl)-6-(4-methylphenyl)-2(aryl)-tetrahydro-2Hpyrazolo[ 3,4-d]thiazol-5-yl]phenyl]methanes nematicidal activity
34706 Exposure to examined metals caused severe lethality toxicities in L1- and L2-larvae...
34717 fig 1, 40um juglone caused a significant increase in lifespan
34757 juglone, paraquat
34758 Rotenone, paraquat
34766 Cry5B toxin
35074 tribendimidine, levamisole, pyrantel
35082 drugs: muscimol and serotonin
35083 myxothiazol, FCCP
35098 Rib1P, 2-deoxy-a-d-ribose 1-phosphate (dRib1P), uridine, UMP, UDP, UTP, ATP, 2-deoxyuridine, 5-FU, 5dFUR, orotidine 5¢-phosphate, PPRP and 5-FU
35114 Pcm-1 mutant dauer larvae exposed to juglone develop into adults with a defect in egg-laying (Egl). Pcm-1 mutant eggs exposed to the oxidizing agents paraquat, homocysteine, and homocysteine thiolactone undergo a developmental delay more pronounced than that observed in wild-type animals. Pcm-1 mutant eggs exposed to homocysteine develop in to adults with a defect in egg-laying.
35522 2-nonanone