Paper Pipeline Scripts

From WormBaseWiki
Jump to: navigation, search

PMID Downloads and Paper Editor

  • get_new_elegans_xml.pl - dowloads new xml records from daily PubMed search of 'elegans'
    • resides here: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/get_new_elegans_xml.pl
  • pap_match.pm - processes the PubMed xml records based on actions in the paper_editor.cgi
    • resides here: /home/postgres/work/pgpopulation/pap_papers/new_papers/pap_match.pm
    • Updated pap_match.pm script on 2017-05-12 to strip '0' from single-digit dates (will be added back when we dump the .ace file - see line 151 in /home/postgres/work/citace_upload/papers/dumpPapAce.pl)
      • We made this change because in late 2016 PubMed changed their date format from '1' to '01', '2' to '02', etc.

Ace File Generation

Papers

Before dumping the papers file, double-check the 'Find Dead Genes' list on the paper editor and make any necessary updates.

papers cronjob is on the acedb account : 0 2 * * thu /home/postgres/work/citace_upload/papers/wrapper.pl

It creates a file at :

 /home/postgres/work/citace_upload/papers/out/papers.ace.<date>

and symlinks it to :

 /home/postgres/public_html/cgi-bin/data/papers.ace

so you can see it on the web at :

 http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace

While the cronjob runs every thursday, the wrapper only dumps on days that are 20something or 30something.

If you ever need to run it on a different week, try :

 /home/postgres/work/citace_upload/papers/dumpPapAce.pl > /home/postgres/work/citace_upload/papers/out/papers.ace.<date>
 rm /home/postgres/public_html/cgi-bin/data/papers.ace
 ln -s /home/postgres/work/citace_upload/papers/out/papers.ace.<date> /home/postgres/public_html/cgi-bin/data/papers.ace

and then you can pick it up from spica by ssh-ing into it, cd to the directory, remove the existing file, and :

 wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace"