Attaching Genes to Papers
Gene Associations Based Upon Abstracts
When papers are added to postgres using the Enter New Papers function of the Paper Editor, the corresponding abstracts are scanned, via a script, for matches to loci, sequence names, and synonyms.
Postgres tables used for this are:
gin_locus
gin_seqname
gin_synonyms
To view the contents of these tables, perform the following type of query using the referenceform.cgi:
SELECT * FROM gin_locus;
Updating the script:
The script that associates genes based upon abstracts does miss some genes because of the way they're expressed in the abstract.
An idea on what to change:
1) With the exception of a dash (-), split text from punctuation and then look for matches to the approved gene list.
For example, fem-1(hc17ts) would become fem-1 ( hc17ts )
DAF-18/PTEN would become DAF-18 / PTEN
Papers where gene associations have been missed for testing the script:
WBPaper00035164 - missed BLI-4
WBPaper00035239 - missed CATP-5
WBPaper00035289 - missed SPP-5
WBPaper00035423 - missed PAR-1
WBPaper00035449 - missed gas-1, isp-1, daf-2, sod-2
WBPaper00035474 - missed fem-1, fem-3
WBPaper00035490 - missed daf-16
WBPaper00035559 - missed NMY-2, GPR-1/2, LIN-5
WBPaper00037741 - missed DAF-18, PHA-4, HSF-1, SIR-2.1, AAK-2
WBPaper00037686 - missed bus-2, bus-4, bus-12, srf-3, bus-8, bus-17
WBPaper00037794 - missed MIG-14
Gene Associations Based on Curated Data
Data types for which curation is stored in postgres:
Antibody
Concise Descriptions
Gene Interactions
Gene Ontology
Gene Regulation
Picture
Transgenes
Variation Phenotype
Back to 2010_-_Paper_Pipeline:_Documentation_and_Instructions