Difference between revisions of "Pipeline for identifying papers with drugs"

From WormBaseWiki
Jump to navigationJump to search
 
(5 intermediate revisions by the same user not shown)
Line 24: Line 24:
 
Exceptions:
 
Exceptions:
 
--Skip pure numbers
 
--Skip pure numbers
 +
 +
 +
All output files are at:
 +
http://textpresso-dev.caltech.edu/disease/drug_disease/output/
  
 
08/06/12:
 
08/06/12:
Line 31: Line 35:
 
*Will drop the above nutritional supplement list, this list is huge, thousands of terms, too big to clean and the Textpresso run output is really bad.
 
*Will drop the above nutritional supplement list, this list is huge, thousands of terms, too big to clean and the Textpresso run output is really bad.
 
*After dropping the supplement list, script re-run, still having problems with the following terms:
 
*After dropping the supplement list, script re-run, still having problems with the following terms:
  AGa  ATP alanine acetate aspartic acid  Ca2  bovine serum albumin
+
For all terms in exclusion list include: Upper case, lower case, beginning letter upper case,
  biotin  chloroform  choline  date  deoxyribonucleic acid  DRAKE  ethanol
+
plural and singular.
  EDTA  fluoride  histidine  glycine  hydroxyapatite  same  soma  ROS
+
  AGa, ATPalanineacetateaspartic acid, Ca2, bovine serum albumin,
  NADH  sodium dodecyl sulfate  violet  nucleotides  nucleic  tetramisole
+
  biotin, chloroform, choline, date, deoxyribonucleic acid, DRAKE, ethanol,
  tyrosine  tryptophan
+
  EDTA, fluoride, histidine, glycine, hydroxyapatite, same, soma, ROS,
  pears  potassium  protease  PEPE  sodium phosphate  GABA  glycerol phenylalanine
+
  NADH, sodium dodecyl sulfate, violet, nucleotides, nucleic, tetramisole,
  pyruvic acid  alanine  glutamate  glutathione  baker's yeast  protamine  lysine
+
  tyrosine, tryptophan, P10,
  fatty acid  fluoride  methionine  nitrogen  succinic acid  sulphate  oatmeal
+
  pears, potassium, protease, PEPE, sodium phosphate, GABA, glycerol, phenylalanine,
  cholinergic  acetic acid  amber  serine  steroid  calcium  AMP  constancy
+
  pyruvic acid, alanine, glutamate, glutathione, baker's yeast, protamine, lysine,
  liver extract  bovine serum albumin  rabbits  valine Saccharomyces cerevisiae
+
  fatty acid, fluoride, methionine, nitrogen, succinic acid, sulphate, oatmeal,
 +
  cholinergic, acetic acid, amber, serine, steroid, calcium, AMP, constancy
 +
  liver extract, bovine serum albumin, rabbits, valineSaccharomyces cerevisiae
 +
 
 +
'''08/06/12 Action list:'''
 +
* Add above words to exclusion list
 +
*Skip the sections: 'Materials and Methods', 'Materials', 'Methods' and 'References'
 +
*Check brackets in the 'Antibiotics' list, removing terms with brackets in the middle and dropping brackets at the end.
 +
*Once we get the out-put we will filter with the word 'drug' at document level.
  
'''Ideas worth trying:'''
+
Back to [[Disease and Drugs]]
*Skip the sections: 'Materials and Methods', 'Materials', 'Methods'
 
*Make rule: Any list term has to occur with the word 'drug' at document level.
 

Latest revision as of 21:37, 6 August 2013

  • Initial plan was to use the 'molecules' list to identify papers with drugs, output was overloaded with papers with biomolecules, does not work for drugs.

Building the drug lexicon: The following sources were used:

http://antibiotics.toku-e.com/antimicrobial

  • Antiparasitic drugs--Aldicarb, Ivermectin, Levamisole
  • Anti-depressants, anti-depressants, anticonvulsants, anti-psychotic and psycho-active drugs:

http://en.wikipedia.org/wiki/List_of_antidepressants

http://www.nimh.nih.gov/health/publications/mental-health-medications/alphabetical-list-of-medications.shtml

http://en.wikipedia.org/wiki/Psychoactive_drug (Table under the heading Affected neurotransmitter systems, capture columns 'Classification' and 'Examples')

http://www.surgeryencyclopedia.com/Fi-La/Immunosuppressant-Drugs.html

Exceptions: --Skip pure numbers


All output files are at: http://textpresso-dev.caltech.edu/disease/drug_disease/output/

08/06/12:

Notes:

  • Need to add Resveratrol, Gingko biloba and the anticonvulsants--Ethosuximide, Arimethadione, if not present in lists
  • Will drop the above nutritional supplement list, this list is huge, thousands of terms, too big to clean and the Textpresso run output is really bad.
  • After dropping the supplement list, script re-run, still having problems with the following terms:
For all terms in exclusion list include: Upper case, lower case, beginning letter upper case, 
plural and singular.
AGa,  ATP,   alanine,   acetate,   aspartic acid,  Ca2,  bovine serum albumin,
biotin,  chloroform,  choline,  date,  deoxyribonucleic acid,  DRAKE,  ethanol,
EDTA,  fluoride,  histidine,  glycine,  hydroxyapatite,  same,  soma,  ROS,
NADH,  sodium dodecyl sulfate,  violet,  nucleotides,  nucleic,  tetramisole,
tyrosine,  tryptophan, P10,
pears,  potassium,  protease,  PEPE,  sodium phosphate,  GABA,  glycerol, phenylalanine,
pyruvic acid,  alanine,  glutamate,  glutathione,  baker's yeast,  protamine,  lysine,
fatty acid,  fluoride,  methionine,  nitrogen,  succinic acid,  sulphate,  oatmeal,
cholinergic,  acetic acid,  amber,  serine,  steroid,  calcium,  AMP,  constancy
liver extract,  bovine serum albumin,  rabbits,  valine,  Saccharomyces cerevisiae

08/06/12 Action list:

  • Add above words to exclusion list
  • Skip the sections: 'Materials and Methods', 'Materials', 'Methods' and 'References'
  • Check brackets in the 'Antibiotics' list, removing terms with brackets in the middle and dropping brackets at the end.
  • Once we get the out-put we will filter with the word 'drug' at document level.

Back to Disease and Drugs