Difference between revisions of "Attaching Genes to Papers"

From WormBaseWiki
Jump to navigationJump to search
Line 29: Line 29:
  
  
'''Papers where gene associations have been missed for testing the script:'''
+
Papers where gene associations have been missed for testing the script:
  
 
WBPaper00035164 - missed BLI-4
 
WBPaper00035164 - missed BLI-4
Line 52: Line 52:
  
 
WBPaper00037794 - missed MIG-14
 
WBPaper00037794 - missed MIG-14
 +
 +
 +
'''This will update the script we run on incoming papers, but we will need to decide what to do about previous papers.
 +
About a year ago, Juancarlos wrote a script to retroactively associate proteins with papers; perhaps we could similarly modify that script to run on previous papers?'''
  
 
==Gene Associations Based on Curated Data==
 
==Gene Associations Based on Curated Data==

Revision as of 16:31, 8 December 2010

Gene Associations Based Upon Abstracts

When papers are added to postgres using the Enter New Papers function of the Paper Editor, the corresponding abstracts are scanned, via a script, for matches to loci, sequence names, and synonyms.

Postgres tables used for this are:

gin_locus

gin_seqname

gin_synonyms

To view the contents of these tables, perform the following type of query using the referenceform.cgi:

SELECT * FROM gin_locus;


Updating the script:

The script that associates genes based upon abstracts does miss some genes because of the way they're expressed in the abstract.

An idea on what to change:

1) With the exception of a dash (-), split text from punctuation and then look for matches to the approved gene list.

For example, fem-1(hc17ts) would become fem-1 ( hc17ts )

DAF-18/PTEN would become DAF-18 / PTEN


Papers where gene associations have been missed for testing the script:

WBPaper00035164 - missed BLI-4

WBPaper00035239 - missed CATP-5

WBPaper00035289 - missed SPP-5

WBPaper00035423 - missed PAR-1

WBPaper00035449 - missed gas-1, isp-1, daf-2, sod-2

WBPaper00035474 - missed fem-1, fem-3

WBPaper00035490 - missed daf-16

WBPaper00035559 - missed NMY-2, GPR-1/2, LIN-5

WBPaper00037741 - missed DAF-18, PHA-4, HSF-1, SIR-2.1, AAK-2

WBPaper00037686 - missed bus-2, bus-4, bus-12, srf-3, bus-8, bus-17

WBPaper00037794 - missed MIG-14


This will update the script we run on incoming papers, but we will need to decide what to do about previous papers. About a year ago, Juancarlos wrote a script to retroactively associate proteins with papers; perhaps we could similarly modify that script to run on previous papers?

Gene Associations Based on Curated Data

Data types for which curation is stored in postgres:

Antibody

Concise Descriptions

Gene Interactions

Gene Ontology

Gene Regulation

Picture

Transgenes

Variation Phenotype


Back to 2010_-_Paper_Pipeline:_Documentation_and_Instructions