Pipeline for identifying papers with drugs
- Initial plan was to use the 'molecules' list to identify papers with drugs, output was overloaded with papers with biomolecules, does not work for drugs.
Building the drug lexicon: The following sources were used:
- Antifungal agents: http://en.wikipedia.org/wiki/Antifungal_medication
- Antibiotics--antimicrobial, anti-fungal, anti-viral, anti-parasitic and anti-tumor agents:
- Antiparasitic drugs--Aldicarb, Ivermectin, Levamisole
- Anti-depressants, anti-depressants, anticonvulsants, anti-psychotic and psycho-active drugs:
http://en.wikipedia.org/wiki/Psychoactive_drug (Table under the heading Affected neurotransmitter systems, capture columns 'Classification' and 'Examples')
- Anaesthics: http://en.wikipedia.org/wiki/Anesthetic
- Anticonvulsants--Ethosuximide, Arimethadione
- Alkaloid drugs: http://en.wikipedia.org/wiki/Alkaloid
- Immunosuppressants: http://en.wikipedia.org/wiki/Immunosuppressive_drug
- Nutritional supplements: http://www.rxlist.com/supplements/alpha_a.htm
- For the purpose of the lexicon:Use generic name for drug name, trade name will be used as synonymn.
Exceptions: --Skip pure numbers
All output files are at: http://textpresso-dev.caltech.edu/disease/drug_disease/output/
- Need to add Resveratrol, Gingko biloba and the anticonvulsants--Ethosuximide, Arimethadione, if not present in lists
- Will drop the above nutritional supplement list, this list is huge, thousands of terms, too big to clean and the Textpresso run output is really bad.
- After dropping the supplement list, script re-run, still having problems with the following terms:
For all terms in exclusion list include: Upper case, lower case, beginning letter upper case, plural and singular. AGa, ATP, alanine, acetate, aspartic acid, Ca2, bovine serum albumin, biotin, chloroform, choline, date, deoxyribonucleic acid, DRAKE, ethanol, EDTA, fluoride, histidine, glycine, hydroxyapatite, same, soma, ROS, NADH, sodium dodecyl sulfate, violet, nucleotides, nucleic, tetramisole, tyrosine, tryptophan, P10, pears, potassium, protease, PEPE, sodium phosphate, GABA, glycerol, phenylalanine, pyruvic acid, alanine, glutamate, glutathione, baker's yeast, protamine, lysine, fatty acid, fluoride, methionine, nitrogen, succinic acid, sulphate, oatmeal, cholinergic, acetic acid, amber, serine, steroid, calcium, AMP, constancy liver extract, bovine serum albumin, rabbits, valine, Saccharomyces cerevisiae
08/06/12 Action list:
- Add above words to exclusion list
- Skip the sections: 'Materials and Methods', 'Materials', 'Methods' and 'References'
- Check brackets in the 'Antibiotics' list, removing terms with brackets in the middle and dropping brackets at the end.
- Once we get the out-put we will filter with the word 'drug' at document level.
Back to Disease and Drugs