Difference between revisions of "Molecule"

From WormBaseWiki
Jump to navigationJump to search
m
m
Line 3: Line 3:
 
[[Example molecule pages]]<br>
 
[[Example molecule pages]]<br>
 
----
 
----
 +
==Molecule Curation==
 +
Molecule curation will capture chemical and drug entities that have been shown to affect the biology of the worm, as well as allow users to link to other databases that deal with these molecule entities in greater detail.
 +
 +
* What we mean by small molecule
 +
** drug
 +
** metabolite (primary and secondary)
 +
** monomers or very small oligomers of nucleic acids, proteins, and polysaccharides
 +
** "Large collections of small molecules (molecular weight about 600 or less), of similar or diverse nature which are used for high-throughput screening analysis of the gene function, protein interaction, cellular processing, biochemical pathways, or other chemical interactions." (from [http://www.nlm.nih.gov/cgi/mesh/2009/MB_cgi?mode=&index=24714&field=all&HM=&II=&PA=&form=&input= nlm.nih.gov]  and [http://en.wikipedia.org/wiki/Small_molecule wikipedia])
 +
 
==Approved model==
 
==Approved model==
This model addresses the small_molecule class, or part of it, that was discussed in the 2007 WB grant.<br>
 
-----<br>
 
  
 
  ///////////////////////////small molecule/chemical/drug ////////////////////////////
 
  ///////////////////////////small molecule/chemical/drug ////////////////////////////
Line 42: Line 49:
 
* DB_info  -> links to entity in other database add following databases to database.ace
 
* DB_info  -> links to entity in other database add following databases to database.ace
  
 
+
==Molecule curation==
==Molecule Curation==
 
Molecule curation will capture chemical and drug entities that have been shown to affect the biology of the worm, as well as allow users to link to other databases that deal with these molecule entities in greater detail.
 
 
 
* What we mean by small molecule
 
** drug
 
** metabolite (primary and secondary)
 
** monomers or very small oligomers of nucleic acids, proteins, and polysaccharides
 
** "Large collections of small molecules (molecular weight about 600 or less), of similar or diverse nature which are used for high-throughput screening analysis of the gene function, protein interaction, cellular processing, biochemical pathways, or other chemical interactions." (from [http://www.nlm.nih.gov/cgi/mesh/2009/MB_cgi?mode=&index=24714&field=all&HM=&II=&PA=&form=&input= nlm.nih.gov]  and [http://en.wikipedia.org/wiki/Small_molecule wikipedia])
 
  
 
===Drug-phenotype curation===
 
===Drug-phenotype curation===
Line 58: Line 57:
 
Molecules will also be linked to genes through their influence on gene activity directly through gene regulation interactions.
 
Molecules will also be linked to genes through their influence on gene activity directly through gene regulation interactions.
  
==Molecule databases==
+
===Molecule databases===
 
Molecule IDs will be provided, when available, for the following databases:<br>
 
Molecule IDs will be provided, when available, for the following databases:<br>
 
*Database "NLM_MeSH" "UID"  
 
*Database "NLM_MeSH" "UID"  
Line 66: Line 65:
 
*Database "KEGG COMPOUND" "ACCESSION_NUMBER"  
 
*Database "KEGG COMPOUND" "ACCESSION_NUMBER"  
  
==Molecule list==
+
===Molecule list===
 
Initially, we will be using MeSH UIDs, assigned by the NLM, as IDs for the molecules in our database.  Due to the more comprehensive coverage of the NLM molecules, and the fact that it is more stably funded, this source was thought to be a good starting point for this project. The list we are starting with is a pared down list of molecules from the NLM, that was created by the Comparative Toxicogenomic Database (CTD), which contains over 130,000 terms.  For each term, this list contains a term name, CTD ID, MeSH UID, and where available CAS Registry Numbers. Using the CasRNs, we extracted the ChEBI ID from the Chemical Entities of Biological Interest database entity list, where it existed, along with any KEGG Compound accession number.
 
Initially, we will be using MeSH UIDs, assigned by the NLM, as IDs for the molecules in our database.  Due to the more comprehensive coverage of the NLM molecules, and the fact that it is more stably funded, this source was thought to be a good starting point for this project. The list we are starting with is a pared down list of molecules from the NLM, that was created by the Comparative Toxicogenomic Database (CTD), which contains over 130,000 terms.  For each term, this list contains a term name, CTD ID, MeSH UID, and where available CAS Registry Numbers. Using the CasRNs, we extracted the ChEBI ID from the Chemical Entities of Biological Interest database entity list, where it existed, along with any KEGG Compound accession number.
  
A sample molecule is:  
+
A sample molecule.ace record:  
 
  Molecule : "C009687"
 
  Molecule : "C009687"
 
  Public_name "wortmannin"
 
  Public_name "wortmannin"
Line 88: Line 87:
 
*We do not have definitions of the terms, nor are the terms arranged in any hierarchical manner; however other databases do, and we provide links to those websites if an ID is available.
 
*We do not have definitions of the terms, nor are the terms arranged in any hierarchical manner; however other databases do, and we provide links to those websites if an ID is available.
 
*Terms and synonyms of terms, will be added as needed, this curation effort still needs to be worked out, ideally the list will be incorporated as a selection list for whatever curation tool a curator is using.
 
*Terms and synonyms of terms, will be added as needed, this curation effort still needs to be worked out, ideally the list will be incorporated as a selection list for whatever curation tool a curator is using.
 
  
 
==Molecule Curation Pipeline==
 
==Molecule Curation Pipeline==
 
Before each upload the molecule.ace file must be made. This file will need to contain all the molecule references with in the RNAi, and Variation Phenotype curation and merge them with the molecule data from the molecule list.
 
Before each upload the molecule.ace file must be made. This file will need to contain all the molecule references with in the RNAi, and Variation Phenotype curation and merge them with the molecule data from the molecule list.
  
==Papers flagged for chemical==
+
==Papers flagged for chemical/molecule curation==
 
{| {{table}} border=1
 
{| {{table}} border=1
 
| align="center" style="background:#f0f0f0;"|'''WBPaperID'''
 
| align="center" style="background:#f0f0f0;"|'''WBPaperID'''

Revision as of 00:10, 7 July 2010

links to relevant pages
Caltech documentation
Example molecule pages


Molecule Curation

Molecule curation will capture chemical and drug entities that have been shown to affect the biology of the worm, as well as allow users to link to other databases that deal with these molecule entities in greater detail.

  • What we mean by small molecule
    • drug
    • metabolite (primary and secondary)
    • monomers or very small oligomers of nucleic acids, proteins, and polysaccharides
    • "Large collections of small molecules (molecular weight about 600 or less), of similar or diverse nature which are used for high-throughput screening analysis of the gene function, protein interaction, cellular processing, biochemical pathways, or other chemical interactions." (from nlm.nih.gov and wikipedia)

Approved model

///////////////////////////small molecule/chemical/drug ////////////////////////////
// 
// ?Molecule
//  * metabolites: precursors, intermediates, or end products of a metabolic pathway
//  * monomeric or very small oligomeric nucleic acids (not RNAi primers), e.g. ATP, ADP, cAMP, GTP, trinucleotide repeats??
//  * chemicals/drugs
//  * minerals, ions, salts
//
////////////////////////////////////////////////////////////////////////////////////
?Molecule     Name ?Text
	       Public_name ?Text
	       Synonym ?Text
	       DB_info Database ?Database ?Database_field ?Accession_number 
               Gene_regulation Gene_regulator ?Gene_regulation XREF Molecule_regulator 
	      Affects_phenotype_of 	Variation ?Variation  #Evidence
					Strain	?Strain	#Evidence
					Transgene ?Transgene #Evidence
					RNAi ?RNAi #Evidence
///////////////////////////////////////////////////////////////////////////////////

Corresponding changes in touched models

/////
?Phenotype_info    Affected_by  Molecule  ?Molecule    #Evidence
/////
/////
?Gene_regulation  Regulator Molecule_regulator   ?Molecule  XREF  Gene_regulator  #Boolean 
/////

Model elements

  • Name-> MeSH UID
  • Public name -> common name in elegans literature
  • Synonym -> other names, how do we mine these from other DBs?
  • DB_info -> links to entity in other database add following databases to database.ace

Molecule curation

Drug-phenotype curation

Molecules will be linked to genes based on their influence on gene activity altered by variation, overexpression, and RNAi-based knockdown.

Drug-gene interactions

Molecules will also be linked to genes through their influence on gene activity directly through gene regulation interactions.

Molecule databases

Molecule IDs will be provided, when available, for the following databases:

  • Database "NLM_MeSH" "UID"
  • Database "CTD" "ChemicalID"
  • Database "ChemIDplus" using the CasRN
  • Database "ChEBI" "CHEBI_ID"
  • Database "KEGG COMPOUND" "ACCESSION_NUMBER"

Molecule list

Initially, we will be using MeSH UIDs, assigned by the NLM, as IDs for the molecules in our database. Due to the more comprehensive coverage of the NLM molecules, and the fact that it is more stably funded, this source was thought to be a good starting point for this project. The list we are starting with is a pared down list of molecules from the NLM, that was created by the Comparative Toxicogenomic Database (CTD), which contains over 130,000 terms. For each term, this list contains a term name, CTD ID, MeSH UID, and where available CAS Registry Numbers. Using the CasRNs, we extracted the ChEBI ID from the Chemical Entities of Biological Interest database entity list, where it existed, along with any KEGG Compound accession number.

A sample molecule.ace record:

Molecule : "C009687"
Public_name "wortmannin"
Database "NLM_MeSH" "UID" "C009687"
Database "CTD"  "ChemicalID" "C009687"
Database "ChemIDplus"  "19545-26-7"
Database "ChEBI" "CHEBI_ID" "52289"
Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C15181"

To make a working list of reference molecules for the various curation efforts, we used Textpresso to scan for all terms on the list that have been published in the C. elegans corpus. The resulting list is less than 6000 terms. The terms that have been identified in the corpus are available here:
http://textpresso-dev.caltech.edu/michael/molecule-obo-analysis/By-Frequency/ This is a directory of files of terms based on the number of times the term appears in the corpus.
and here:
http://textpresso-dev.caltech.edu/michael/molecule-obo-analysis/By-Frequency/all This is a list of all terms from the previous files concatenated into one.
This last file is being used as a starting file for molecule look-up by WB curators.
Caveats and notes:

  • The list is now small enough that if we wanted to load it into WB at least we know that every term has some relevance to the literature (although unverified).
  • The list is small enough to be amenable to editing through ontology editors like OBOedit (even though it is not an ontology).
  • We do not have definitions of the terms, nor are the terms arranged in any hierarchical manner; however other databases do, and we provide links to those websites if an ID is available.
  • Terms and synonyms of terms, will be added as needed, this curation effort still needs to be worked out, ideally the list will be incorporated as a selection list for whatever curation tool a curator is using.

Molecule Curation Pipeline

Before each upload the molecule.ace file must be made. This file will need to contain all the molecule references with in the RNAi, and Variation Phenotype curation and merge them with the molecule data from the molecule list.

Papers flagged for chemical/molecule curation

WBPaperID Comment
461 tetramisole
464 Yes
484 Yes
493 enhanced sensitivity of flu-2 mutants to EMS
536 pharmacological analysis of cell function (spermatozoan motility) not gene function
1001 Vanadate, AMP-PNP, ATP-gamma-S. NEM, Triton X-100, Taxol, analysis on unknown motor protein
1010
1524 forskolin, NaF, AlCl3, GTPgammaS, GppNHp, GDPbetaS, pertussis toxin, cholera toxin
2029 campthothecin, berenil, trypanocidal drug, magnesium ion on DNA relaxation, and isolated topoisomerase, no gene product mentioned
2116
3137 sodium arsenite, mercuric choride
3150 ouabain
13430 hematin, benzimidazole, Albendazole Diethylcarbamazine Fenbendazole Hematin Imidazole Ivermectin Levamisole Mebendazole Methimazole Morantel tartrate Oxibendazole Piperazine Pyrantel tartrate Thiabendazole: only tested in vitro on H. contortus GST activity.
24940 aldicarb
24950 diacetyl
25017 fig 4
25173 Fig 5, Fig 7.
28357 yes
28562 octopamine
28879 serotonin
28900 fig.3
29114 yes
29130 fig.3, fig.4
30726 isoamyl alcohol
30928 fig.2 d) Exogenous serotonin and fluoxetine suppress 100G-induced DAF- 16TGFP nuclear accumulation.
31225 yes
31321 Figure 3. High NaCl accelerates aging of C. elegans
31336 nitrogen mustard
31342 yes
31419 yes
31424 fig.7
31427 vitamin E
31428 yes
31456 tunicamycin
31464 yes
31468 fig.3
31474 yes
31482 FUDR
31483 mPyrazine
31490 yes
31509 yes
31530 yes
31535 yes
31571 yes
31593 yes
31604 nicotine
31626 yes
31627 yes
31644 yes
31657 yes
31667 yes
31669 ethanol
31672 yes
31682 yes
31683 yes
31690 yes
31694 yes
31703 haem
31810 yes
31824 yes
31834 yes
31850 fluorathene
31857 yes
31866 yes
31871 DHP
31872 aldicarb
31873 arsenite
31882 8-Br-cGMP
31895 Figure 4. Responses of wild-type (N2) and slo-1 mutant worms to aldicarb and levamisole
31897 fig.4
31915 yes
31924 yes
31939 table 2
31941 paraquat
31959 yes
31977 imipramine
31982 glutamate
31991 Flavone (2-phenyl chromone)
31992 NaCl
31994 arsenite
31996 fig.1
31999 diltiazem
32000 arecoline
32007 levamisole
32008 aldicarb
32011 aldicarb
32031 NaCl
32033 yes
32035 yes
32050 yes
32072 amiloride
32077 C17ISO
32079 Dextran-HCC
32093 Quercetin
32101 cadmium, chlorpyrifos, nickel, prochloraz, diuron
32103 yes
32125 imidacloprid, thiacloprid
32131 yes
32142 phosphine, FCCP (carbonyl cyanide 4-(trifluoromethoxy)phenylhydrazone), PCP (2,3,4,5,6pentachlorophenol), DNP (2,4-dinitrophenol), sodium azide
32143 yes
32181 Table S2. Activity of the ascarosides in Daf-c and Daf-d strain backgrounds in the dauer formation assay
32192 tunicamycin
32207 1NA-PP1
32215 butanone, benzaldehyde
32232 paraquat
32237 paraquat
32241 paraquat
32243 ceramide
32252 aldicarb
32255 H2O2
32259 3-methyladenine
32266 metaboilte DA dafachronic acid, steroids
32271 ethanol aldicarb
32286 root exudate
32295 Ethosuximide
32319 NaCl and isoamyl alcohol (chemoattractants).
32335 serotonin
32336 benzaldehyde
32358 aldicarb
32359 ivermectin
32366 levamisole, nicotine
32390 flavonoid quercetin
32427 volatile anesthetic, halothane
32429 1-octanol
32470 Species specificity of dauer pheromone extracts between elegans and Pristionchus. Response of Pristionchus and Strongyloides to different dafachronic acids.
32475 mianserin methiothepin
32478 levamisole
32491 amino acids
32494 AgNO3, CdCl2,CrCl2, CoCl2, CuSO4, HgCl2, MnCl2, NiSO4, Pb(NO3)2,and ZnCl2
32508 chlorpyrifos oxon; cadmium chloride); hexachlorophene; neurotoxicants; chlorpyrifos, methyl mercury, chlordiazepoxide, tebuconazol; Cocaine; metals, ethanol, solvents, organophosphate; carbamate pesticides
32517 Fig 1.The structures of the dauer pheromone components, ascaroside C6 (1), ascaroside C9 (2), ascaroside C3 (3), and the less active ascaroside C7 (4). Fig 2. The chemical structures of the long-chain ascarosides from conditioned medium extracts from long-term dhs-28 cultures. The structural assignment of long-chain ascaroside 17 is tentative
32522 Sulfonamide CA inhibitors (CAIs) such as acetazolamide AZA, methazolamide MZA or ethoxzolamide EZA, many new sulfonamides (such as GUZ
32878 NaN3, benzaldehyde
32880 hydrogen peroxide
32881 5-HT; Octanol, DiD
32884 paraquat, juglone
32886 dafachronic acids; Supp. New compounds reported in this study are: III, S-III, S-V, S-XV, S-XVI, S-XXII, XXIII, S-XXIII, XXIV, XXVII.
32887
32888 dichlorvos,an organophosphorus insecticide, acetylcholinesterase, cadmium chloride
32889 sulfonamides; CAIs; 2-(hydrazinocarbonyl)-3-substituted-phenyl-1H-indole-5-sulfonamides possessing various 2-,3- or 4-substituted phenyl groups with methyl-, halogeno- and methoxy-functionalities, as well as the perfluorophenyl moiety; AZA and EZA
32901 CO2
32903 paraquat
32918 Thus, (25R)-D7-dafachronic acid (2a) is one order of magnitude more active than (25R)-D4-dafachronic acid (1a). Similar to a previous study,5 little or no activity was detected with (25R)-cholestenoic acid (3a)
32925 cycloheximide
32932 resveratrol peptone
32949 Paraquat, tunicamycin
32956 These chemicals used on Pratylenchus penetrans: Acetic acid Propionic acid Isobutyric acid n-Butyric acid Isovaleric acid n-Valeric acid n-Caproic acid These chemicals used on C. elegans Acetic acid n-Caproic acid
32958 Fe-exposure
32968 tetramisole
32989 The levels of superoxide radical (.0^2-), in both mitochondria and cytosol, are increased in sod-1(tm776) and sod-1(tm783) mutants.
32997 dauer pheromone
33002 aldicarb levamisole
33004 electrically evoked and light evoked pharyngeal cholinergic post synaptic potentials reduced in amplitude by nicotinic antagonists benzoquinonium chloride and d-tubocurarine. Effect dose dependent.
33009 checked
33024 chloramine-T (CHT), DTT, H^20^2: tested for effects on both KVS-1 activity in vitro and on chemotaxis in vivo.
33037 dauer pheromone; 8-bromo-cGMP
33040 paraquat, excess O^2
33049 Carbamate, Aldicarb, Carbofuran, Oxamyl, Neostigmine, Eserine, Organophosphates, Fenomiphos, Ethoprop, Parathion, Paraoxon, Phorate, Terbufos, meta-chlorperbenzoic acid, m-CPBA, diotioate
33051 identification of new daf-22 dependent dauer pheromone and mating pheromones ascr#7, ascr#8, and ascr#6.1
33060 green tea polyphenol epigallocatechin gallate (EGCG) -
33077 flavopiridol olomoucine II
33086 hyperosmotic, sodium chloride, anoxia
33094 PMA induces the pnlp-29::gfp transgene
33099 dichlorvos fenamiphos organophosphates organophosphorous pesticides, neurotoxicant, mefloquine,
33115 low oxygen, lactacystn
33126 tunicamycin, glucose, deoxyglucose, sorbitol, glucose analog
33130 D-ribose affect larval growth
33158 Yes. Response to Diacetyl.
33162 The uncoupler CCCP (carbonylcyanide- 3-chlorophenylhydrazone) extends lifespan.
33166 dietary zinc
33168 Antipsychotic drugs, cyclosporin A
33189 Iron, PQS, PQS+Fe3
33433 copper
33441 Pb, Hg, Cd, and Cr metals CdCl2, CrCl2, HgCl2, and Pb(NO3)2 in solution were used here: 2.5M, 50M, and 100M.
33448 Catechin Hydrogen peroxide catechin hydrate
33456 8-bromo cGMP, paraquat, vinpocetine, zaprinast, EHNA (erythro-9-[2-hydroxy-3-nonyl] adenine)
34686 CdCl2, CrCl2, HgCl2, Pb(NO3)2
34687 Screened ~54,000 chemicals from various libraries: Bioactives, natural product extracts, Analyticon purified natural product compounds, Diversity-oriented synthesis, ChemBridge kinases, ChemDiv, TimTec, MayBridge, ChemBridge. note: screened on glp-4(bn2);sek-1(km4) mutant worms Anti-infective hits that cure C. elegans of an E. faecalis infection at a concentration lower than the in vitro MIC with E. faecalis are grouped into 6 structural classes (representative structures shown).
34688 Bis-[4-methoxy-3- [3-(4-fluorophenyl)-6-(4-methylphenyl)-2(aryl)-tetrahydro-2Hpyrazolo[ 3,4-d]thiazol-5-yl]phenyl]methanes nematicidal activity
34706 Exposure to examined metals caused severe lethality toxicities in L1- and L2-larvae...
34717 fig 1, 40um juglone caused a significant increase in lifespan
34757 juglone, paraquat
34758 Rotenone, paraquat
34766 Cry5B toxin
35074 tribendimidine, levamisole, pyrantel
35082 drugs: muscimol and serotonin
35083 myxothiazol, FCCP
35098 Rib1P, 2-deoxy-a-d-ribose 1-phosphate (dRib1P), uridine, UMP, UDP, UTP, ATP, 2-deoxyuridine, 5-FU, 5dFUR, orotidine 5¢-phosphate, PPRP and 5-FU
35114 Pcm-1 mutant dauer larvae exposed to juglone develop into adults with a defect in egg-laying (Egl). Pcm-1 mutant eggs exposed to the oxidizing agents paraquat, homocysteine, and homocysteine thiolactone undergo a developmental delay more pronounced than that observed in wild-type animals. Pcm-1 mutant eggs exposed to homocysteine develop in to adults with a defect in egg-laying.
35522 2-nonanone