Several possible strategies to test.

Ideally, we'd like to get associations made as quickly as possible (better for users, helps for curation).

Different pipelines will likely be needed for different types of publications:

  1. Smaller scale research papers
  2. High-throughput, large-scale papers
  3. Reviews, Comments, etc.

Results may differ for sectioned vs non-sectioned papers, and will likely differ for Reviews, etc.

  1. Abstracts
  2. Gene frequency
  3. Genes in Results (or equivalent)
  4. Gene frequency in Results (or equivalent)
  5. Genes mentioned along with word in Figure or Table category
  6. Genes for which there is curated data
  7. Some combination of the above
  8. What to do about large-scale papers
  9. What to do about new gene names not yet in WB
  10. What to do about supplemental data
  11. What to do about non-standard nomenclature
  12. Current pipeline - what happens when gene IDs are made invalid?

