Difference between revisions of "Paper Pipeline - To Do List"
Line 11: | Line 11: | ||
'''Add a limited number of new paper types to allow for single type classification - Kimberly and Juancarlos''' | '''Add a limited number of new paper types to allow for single type classification - Kimberly and Juancarlos''' | ||
− | '''Finish documentation of current paper type mappings | + | '''Finish documentation of current paper type mappings (PubMed vs postgres vs Journal) for informing SVMs and Textpresso searches - Caltech''' |
'''Decide how to handle upload of Genetics papers - Karen, Juancarlos, Kimberly''' | '''Decide how to handle upload of Genetics papers - Karen, Juancarlos, Kimberly''' | ||
Line 20: | Line 20: | ||
'''Decide if, and how, we want to run a script that cross-checks PubMed and postgres data - Caltech''' | '''Decide if, and how, we want to run a script that cross-checks PubMed and postgres data - Caltech''' | ||
+ | |||
+ | '''Explore idea of initial binary paper classification in PubMed for Journal Articles - primary experimental data, no primary experimental data - WormBase, others''' |
Revision as of 11:32, 21 July 2009
Short-Term
Correct invalid PMIDs and their associated paper types - Kimberly
- How often do we want to check for invalid PMIDs?
- PubMed maintains a file of obsolete PMIDs on their ftp site: ftp://ftp.ncbi.nlm.nih.gov/pubmed/deleted_pmids.txt
Write up a summary of the weekly checking script - Kimberly and Juancarlos
- Runs every Sunday at 1AM.
- Checks papers entries in postgres that are missing author, pages, title, type, or year.
Add a limited number of new paper types to allow for single type classification - Kimberly and Juancarlos
Finish documentation of current paper type mappings (PubMed vs postgres vs Journal) for informing SVMs and Textpresso searches - Caltech
Decide how to handle upload of Genetics papers - Karen, Juancarlos, Kimberly
Long-Term
Discuss timeline and implications for changing the Paper models to allow for multiple types - WormBase
Decide if, and how, we want to run a script that cross-checks PubMed and postgres data - Caltech
Explore idea of initial binary paper classification in PubMed for Journal Articles - primary experimental data, no primary experimental data - WormBase, others