Paper Pipeline Scripts
From WormBaseWiki
Jump to navigationJump to searchPMID Downloads and Paper Editor
- get_new_elegans_xml.pl - dowloads new xml records from daily PubMed search of 'elegans'
- resides here: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/get_new_elegans_xml.pl
- pap_match.pm - processes the PubMed xml records based on actions in the paper_editor.cgi
- resides here: /home/postgres/work/pgpopulation/pap_papers/new_papers/pap_match.pm
- Updated pap_match.pm script on 2017-05-12 to strip '0' from single-digit dates (will be added back when we dump the .ace file - see line 151 in /home/postgres/work/citace_upload/papers/dumpPapAce.pl)
- We made this change because in late 2016 PubMed changed their date format from '1' to '01', '2' to '02', etc.
Ace File Generation
Papers
Before dumping the papers file, double-check the 'Find Dead Genes' list on the paper editor and make any necessary updates.
papers cronjob is on the acedb account : 0 2 * * thu /home/postgres/work/citace_upload/papers/wrapper.pl
It creates a file at :
/home/postgres/work/citace_upload/papers/out/papers.ace.<date>
and symlinks it to :
/home/postgres/public_html/cgi-bin/data/papers.ace
so you can see it on the web at :
http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace
While the cronjob runs every thursday, the wrapper only dumps on days that are 20something or 30something.
If you ever need to run it on a different week, try :
/home/postgres/work/citace_upload/papers/dumpPapAce.pl > /home/postgres/work/citace_upload/papers/out/papers.ace.<date> rm /home/postgres/public_html/cgi-bin/data/papers.ace ln -s /home/postgres/work/citace_upload/papers/out/papers.ace.<date> /home/postgres/public_html/cgi-bin/data/papers.ace
and then you can pick it up from spica by ssh-ing into it, cd to the directory, remove the existing file, and :
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace"