Difference between revisions of "Paper Pipeline Scripts"
From WormBaseWiki
Jump to navigationJump to search(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | =PMID Downloads and Paper Editor= | ||
*get_new_elegans_xml.pl - dowloads new xml records from daily PubMed search of 'elegans' | *get_new_elegans_xml.pl - dowloads new xml records from daily PubMed search of 'elegans' | ||
**resides here: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/get_new_elegans_xml.pl | **resides here: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/get_new_elegans_xml.pl | ||
Line 4: | Line 5: | ||
*pap_match.pm - processes the PubMed xml records based on actions in the paper_editor.cgi | *pap_match.pm - processes the PubMed xml records based on actions in the paper_editor.cgi | ||
**resides here: /home/postgres/work/pgpopulation/pap_papers/new_papers/pap_match.pm | **resides here: /home/postgres/work/pgpopulation/pap_papers/new_papers/pap_match.pm | ||
+ | **Updated pap_match.pm script on 2017-05-12 to strip '0' from single-digit dates (will be added back when we dump the .ace file - see line 151 in /home/postgres/work/citace_upload/papers/dumpPapAce.pl) | ||
+ | ***We made this change because in late 2016 PubMed changed their date format from '1' to '01', '2' to '02', etc. | ||
+ | |||
+ | =Ace File Generation= | ||
+ | '''Papers''' | ||
+ | |||
+ | Before dumping the papers file, double-check the 'Find Dead Genes' list on the paper editor and make any necessary updates. | ||
+ | |||
+ | papers cronjob is on the acedb account : | ||
+ | 0 2 * * thu /home/postgres/work/citace_upload/papers/wrapper.pl | ||
+ | |||
+ | It creates a file at : | ||
+ | /home/postgres/work/citace_upload/papers/out/papers.ace.<date> | ||
+ | and symlinks it to : | ||
+ | /home/postgres/public_html/cgi-bin/data/papers.ace | ||
+ | so you can see it on the web at : | ||
+ | http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace | ||
+ | |||
+ | While the cronjob runs every thursday, the wrapper only dumps on days | ||
+ | that are 20something or 30something. | ||
+ | |||
+ | If you ever need to run it on a different week, try : | ||
+ | /home/postgres/work/citace_upload/papers/dumpPapAce.pl > /home/postgres/work/citace_upload/papers/out/papers.ace.<date> | ||
+ | rm /home/postgres/public_html/cgi-bin/data/papers.ace | ||
+ | ln -s /home/postgres/work/citace_upload/papers/out/papers.ace.<date> /home/postgres/public_html/cgi-bin/data/papers.ace | ||
+ | and then you can pick it up from spica by ssh-ing into it, cd to the | ||
+ | directory, remove the existing file, and : | ||
+ | wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace" |
Latest revision as of 20:59, 12 May 2017
PMID Downloads and Paper Editor
- get_new_elegans_xml.pl - dowloads new xml records from daily PubMed search of 'elegans'
- resides here: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/get_new_elegans_xml.pl
- pap_match.pm - processes the PubMed xml records based on actions in the paper_editor.cgi
- resides here: /home/postgres/work/pgpopulation/pap_papers/new_papers/pap_match.pm
- Updated pap_match.pm script on 2017-05-12 to strip '0' from single-digit dates (will be added back when we dump the .ace file - see line 151 in /home/postgres/work/citace_upload/papers/dumpPapAce.pl)
- We made this change because in late 2016 PubMed changed their date format from '1' to '01', '2' to '02', etc.
Ace File Generation
Papers
Before dumping the papers file, double-check the 'Find Dead Genes' list on the paper editor and make any necessary updates.
papers cronjob is on the acedb account : 0 2 * * thu /home/postgres/work/citace_upload/papers/wrapper.pl
It creates a file at :
/home/postgres/work/citace_upload/papers/out/papers.ace.<date>
and symlinks it to :
/home/postgres/public_html/cgi-bin/data/papers.ace
so you can see it on the web at :
http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace
While the cronjob runs every thursday, the wrapper only dumps on days that are 20something or 30something.
If you ever need to run it on a different week, try :
/home/postgres/work/citace_upload/papers/dumpPapAce.pl > /home/postgres/work/citace_upload/papers/out/papers.ace.<date> rm /home/postgres/public_html/cgi-bin/data/papers.ace ln -s /home/postgres/work/citace_upload/papers/out/papers.ace.<date> /home/postgres/public_html/cgi-bin/data/papers.ace
and then you can pick it up from spica by ssh-ing into it, cd to the directory, remove the existing file, and :
wget "http://tazendra.caltech.edu/~postgres/cgi-bin/data/papers.ace"