Getting Papers into WormBase

From WormBaseWiki
Jump to: navigation, search

From Kimberly:

  • This is how I've been uploading papers as of ~February 2009:
    • Every day at 6AM PST, an automated script accesses PubMed to perform a search using the keyword 'elegans'.
    • The script lives here: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/get_new_elegans_xml.pl
    • This script also includes the PubMed not final query and calls out to the pmid2doi script to update the dois
    • Initially, XMLs are stored in the XML directory according to PMID: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/xml
      • Each PMID corresponds to a file of the corresponding XML
      • When a paper is reviewed, one of three things can happen. It is:
        • 1) accepted and the XML is moved into the done directory: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/done
        • 2) rejected and the XML is moved into the rejected_pmids file: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/rejected_pmids
        • 3) removed and the XML is moved into the removed_pmids file: /home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/removed_pmids
    • The results of this search are presented in the Paper Editor
    • To enter new papers, select your name from the curator list, scroll down to the bottom of the page, and click on the link that says, 'Enter New Papers!'
    • The New Papers page lists the PMID and abstract for each of the returned papers. Alongside each of the abstracts is a radio button where a curator can elect to accept or reject the paper.
    • Papers that are accepted will be subsequently processed and receive a WBPaper ID.
    • The PMIDs for rejected papers are stored in a file (/home/postgres/work/pgpopulation/wpa_papers/pmid_downloads/rejected_pmids) to be used as a negative training set for machine learning approaches to paper identification (triage).
  • Duplicate papers in PubMed
    • Occasionally, a paper will appear twice in PubMed. Since February 2009, this has happened for two papers from PLoS Genetics. The first PudMed ID was deprecated (in PubMed) and the second retained.
    • I had approved the papers the first time they showed up on the paper editor, but wasn't sure what to do the second time they came in.
    • From discussions with Daniel and Juancarlos, here is what to do if that happens again:
      • Double-check which of the two IDs is valid in PubMed.
      • Once you've confirmed which ID is valid, you can use the paper editor to make the old (now invalid) PubMed ID invalid in postgres and add the new, valid PubMed ID.
  • Papers that I accept:
  • Papers that I typically reject:


Back to Paper Pipeline