Paper Pipeline - To Do List
Short-Term
Correct invalid PMIDs and their associated paper types - Kimberly
- How often do we want to check for invalid PMIDs?
- PubMed maintains a file of obsolete PMIDs on their ftp site: ftp://ftp.ncbi.nlm.nih.gov/pubmed/deleted_pmids.txt
Write up a summary of the weekly checking script - Kimberly and Juancarlos
- Runs every Sunday at 1AM.
- Checks papers entries in postgres that are missing author, pages, title, type, or year.
Add a limited number of new paper types to allow for single type classification - Kimberly and Juancarlos
Finish documentation of current paper type mappings (PubMed vs postgres vs Journal) for informing SVMs and Textpresso searches - Caltech
Decide how to handle upload of Genetics papers - Karen, Juancarlos, Kimberly
Long-Term
Discuss timeline and implications for changing the Paper models to allow for multiple types - WormBase
Decide if, and how, we want to run a script that cross-checks PubMed and postgres data - Caltech
Explore idea of initial binary paper classification in PubMed for Journal Articles - primary experimental data, no primary experimental data - WormBase, others