Attaching Genes to Papers

From WormBaseWiki
Jump to navigationJump to search

Gene Associations Based Upon Abstracts

When papers are added to postgres using the Enter New Papers function of the Paper Editor, the corresponding abstracts are scanned, via a script, for matches to loci, sequence names, and synonyms.

Postgres tables used for this are:

gin_locus

gin_seqname

gin_synonyms

To view the contents of these tables, perform the following type of query using the referenceform.cgi:

SELECT * FROM gin_locus;


Updating the script:

The script that associates genes based upon abstracts does miss some genes because of the way they're expressed in the abstract.

Some ideas on what to change:

1) With the exception of a dash (-), split text (words or letters) from punctuation and then look for matches to the approved gene list.

For example, fem-1(hc17ts) would become fem-1 ( hc17ts )

DAF-18/PTEN would become DAF-18 / PTEN


Papers where gene associations have been missed for checking and re-training the script:

WBPaper00035164 - missed BLI-4

WBPaper00035239 - missed CATP-5

WBPaper00035289 - missed SPP-5

WBPaper00035423 - missed PAR-1

WBPaper00035449 - missed gas-1, isp-1, daf-2, sod-2

WBPaper00035474 - missed fem-1, fem-3

WBPaper00035490 - missed daf-16

WBPaper00035559 - missed NMY-2, GPR-1/2, LIN-5

WBPaper00037741 - missed DAF-18, PHA-4, HSF-1, SIR-2.1, AAK-2

WBPaper00037686 - missed bus-2, bus-4, bus-12, srf-3, bus-8, bus-17

WBPaper00037794 - missed MIG-14/WIs

Gene Associations Based on Curated Data

Data types for which curation is stored in postgres:

Antibody

Concise Descriptions

Gene Interactions

Gene Ontology

Gene Regulation

Picture

Transgenes

Variation Phenotype


Back to 2010_-_Paper_Pipeline:_Documentation_and_Instructions