Paper Pipeline - To Do List

From WormBaseWiki
Jump to navigationJump to search

Short-Term

Correct invalid PMIDs and their associated paper types - Kimberly

Write up a summary of the weekly checking script - Kimberly and Juancarlos

  • Runs every Sunday at 1AM.
  • Checks papers entries in postgres that are missing author, pages, title, type, or year.

Add a limited number of new paper types to allow for single type classification - Kimberly and Juancarlos

  • From our conversion of papers with type OTHER, we found several cases where papers also had a single type, but that type is not on our current list of paper types:
  • Type (number of papers)
  • Interview (4)
  • Lectures (2)
  • Congresses (1)
  • Interactive tutorial (1)

Finish documentation of current paper type mappings (PubMed vs postgres vs Journal) for informing SVMs and Textpresso searches - Caltech

  • For CCC, I'd initially just searched through all paper types, but removing some paper types from the searches would more accurately reflect the real curation pipeline.
  • Eliminating some types is obvious, e.g. MEETING and GAZETTE ABSTRACTS, REVIEWS, but what about the others, e.g. COMMENT, LETTERS, etc?
  • It occurred to me that I don't really know what types of papers are included in some of the other paper types like COMMENT and LETTERS, so it'd be good to know so I can make a better decision about what to include.
  • To start to address this, I looked at all papers in postgres with type COMMENT and checked both their PubMed paper type classification and what these papers actually are in the journals: Summary of Papers Labeled COMMENT
  • This lead to the conclusion that type COMMENT should NOT be included in the CCC pipeline.
  • Additional postgres types that should be checked: NEWS, NOTE, LETTER, MONOGRAPH, EDITORIAL and perhaps a few others like CORRECTION and ERRATUM

Decide how to handle upload of Genetics papers - Karen, Juancarlos, Kimberly

Long-Term

Discuss timeline and implications for changing the Paper models and Paper editor to allow for multiple types - WormBase

  • Which types should we include?
  • Will any of the current paper type mappings be retired? If so, which ones?
  • Do we want to include paper types that PubMed doesn't have, e.g. Book?
  • How will we update our records? What should be done with history?
  • Will this affect the web display for papers in any way?
  • Current list of all PubMed types represented by papers previously classified as type OTHER in postgres:

journal article

comparative study

evaluation studies

review

english abstract

validation studies

case reports

in vitro

interview

letter

lectures

comment

historical article

controlled clinical trial

interactive tutorial

biography

randomized controlled trial

retracted publication

congresses

multicenter study


Decide if, and how, we want to run a script that cross-checks PubMed and postgres data - Caltech

Explore idea of initial binary paper classification in PubMed for Journal Articles - primary experimental data, no primary experimental data - WormBase, others