Difference between revisions of "Pipeline for identifying papers with drugs"

From WormBaseWiki
Jump to navigationJump to search
Line 39: Line 39:
 
  EDTA,  fluoride,  histidine,  glycine,  hydroxyapatite,  same,  soma,  ROS,
 
  EDTA,  fluoride,  histidine,  glycine,  hydroxyapatite,  same,  soma,  ROS,
 
  NADH,  sodium dodecyl sulfate,  violet,  nucleotides,  nucleic,  tetramisole,
 
  NADH,  sodium dodecyl sulfate,  violet,  nucleotides,  nucleic,  tetramisole,
  tyrosine,  tryptophan,
+
  tyrosine,  tryptophan, P10,
 
  pears,  potassium,  protease,  PEPE,  sodium phosphate,  GABA,  glycerol, phenylalanine,
 
  pears,  potassium,  protease,  PEPE,  sodium phosphate,  GABA,  glycerol, phenylalanine,
 
  pyruvic acid,  alanine,  glutamate,  glutathione,  baker's yeast,  protamine,  lysine,
 
  pyruvic acid,  alanine,  glutamate,  glutathione,  baker's yeast,  protamine,  lysine,

Revision as of 21:15, 6 August 2012

  • Initial plan was to use the 'molecules' list to identify papers with drugs, output was overloaded with papers with biomolecules, does not work for drugs.

Building the drug lexicon: The following sources were used:

http://antibiotics.toku-e.com/antimicrobial

  • Antiparasitic drugs--Aldicarb, Ivermectin, Levamisole
  • Anti-depressants, anti-depressants, anticonvulsants, anti-psychotic and psycho-active drugs:

http://en.wikipedia.org/wiki/List_of_antidepressants

http://www.nimh.nih.gov/health/publications/mental-health-medications/alphabetical-list-of-medications.shtml

http://en.wikipedia.org/wiki/Psychoactive_drug (Table under the heading Affected neurotransmitter systems, capture columns 'Classification' and 'Examples')

http://www.surgeryencyclopedia.com/Fi-La/Immunosuppressant-Drugs.html

Exceptions: --Skip pure numbers


All output files are at: http://textpresso-dev.caltech.edu/disease/drug_disease/output/

08/06/12:

Notes:

  • Need to add Resveratrol, Gingko biloba and the anticonvulsants--Ethosuximide, Arimethadione, if not present in lists
  • Will drop the above nutritional supplement list, this list is huge, thousands of terms, too big to clean and the Textpresso run output is really bad.
  • After dropping the supplement list, script re-run, still having problems with the following terms:
AGa,  ATP,   alanine,   acetate,   aspartic acid,  Ca2,  bovine serum albumin,
biotin,  chloroform,  choline,  date,  deoxyribonucleic acid,  DRAKE,  ethanol,
EDTA,  fluoride,  histidine,  glycine,  hydroxyapatite,  same,  soma,  ROS,
NADH,  sodium dodecyl sulfate,  violet,  nucleotides,  nucleic,  tetramisole,
tyrosine,  tryptophan, P10,
pears,  potassium,  protease,  PEPE,  sodium phosphate,  GABA,  glycerol, phenylalanine,
pyruvic acid,  alanine,  glutamate,  glutathione,  baker's yeast,  protamine,  lysine,
fatty acid,  fluoride,  methionine,  nitrogen,  succinic acid,  sulphate,  oatmeal,
cholinergic,  acetic acid,  amber,  serine,  steroid,  calcium,  AMP,  constancy
liver extract,  bovine serum albumin,  rabbits,  valine,  Saccharomyces cerevisiae

Ideas worth trying:

  • Skip the sections: 'Materials and Methods', 'Materials', 'Methods' and 'References'
  • Once we get the out-put we will filter with the word 'drug' at document level.