Difference between revisions of "WormBase Paper Categorization"

From WormBaseWiki
Jump to navigationJump to search
m (Created page with 'WormBase Paper Categorization = Statement of Purpose = In an attempt to improve curation efficiency and provide distinct curation milestones, we would like to classify the C. e…')
 
m
Line 5: Line 5:
 
In an attempt to improve curation efficiency and provide distinct curation milestones, we would like to classify the C. elegans corpus into biologically relevant categories. These categories could represent different biological processes, molecular functions, anatomy terms, disease relevance, signaling pathways, phenotypes or any other categorical distinction we see as appropriate as we investigate this organizational scheme. This Wiki page is intended to collect and organize our thoughts and proposals as to how to best organize the C. elegans corpus (or for nematodes in general), including but not limited to:
 
In an attempt to improve curation efficiency and provide distinct curation milestones, we would like to classify the C. elegans corpus into biologically relevant categories. These categories could represent different biological processes, molecular functions, anatomy terms, disease relevance, signaling pathways, phenotypes or any other categorical distinction we see as appropriate as we investigate this organizational scheme. This Wiki page is intended to collect and organize our thoughts and proposals as to how to best organize the C. elegans corpus (or for nematodes in general), including but not limited to:
  
#What fundamental categories and/or hierarchies we can create to categorize papers
+
# What fundamental categories and/or hierarchies we can create to categorize papers
#Methods or approaches for assigning individual papers to these categories
+
# Methods or approaches for assigning individual papers to these categories
#How we go about choosing a category as a curation priority
+
# How we go about choosing a category as a curation priority
#Determining what our goals should be for tackling a category or topic
+
# Determining what our goals should be for tackling a category or topic
#Distributing the curation efforts among curators
+
# Distributing the curation efforts among curators
  
 
= Categorical Schemes =
 
= Categorical Schemes =
  
 +
Categories could be devised or arranged in a number of ways. Current or proposed approaches include:
  
 +
* Using WormBook chapters as a basis for categories
 +
* Pathways (Wnt, MAPK, TGF, Ras, etc.)
 +
* Processes (Aging, Sex Determination, Meiosis)
 +
* Phenotypes
 +
* Commonly referenced gene sets
  
= Methods & Approaches =
+
= Methods & Approaches for Paper Categorization =
  
 +
* Collecting (manually) lists of relevant keywords, and performing (manual) Textpresso or PubMed searches
 +
* Running Textpresso scripts
 +
** SVM-based, supervised vs. unsupervised, keywords
 +
** Requires positives and negatives for training
 +
* Collecting common keywords from papers in out Author First Pass list of papers
 +
** Yuling has run scripts to determine word frequencies among these papers
  
  

Revision as of 18:50, 19 July 2013

WormBase Paper Categorization

Statement of Purpose

In an attempt to improve curation efficiency and provide distinct curation milestones, we would like to classify the C. elegans corpus into biologically relevant categories. These categories could represent different biological processes, molecular functions, anatomy terms, disease relevance, signaling pathways, phenotypes or any other categorical distinction we see as appropriate as we investigate this organizational scheme. This Wiki page is intended to collect and organize our thoughts and proposals as to how to best organize the C. elegans corpus (or for nematodes in general), including but not limited to:

  1. What fundamental categories and/or hierarchies we can create to categorize papers
  2. Methods or approaches for assigning individual papers to these categories
  3. How we go about choosing a category as a curation priority
  4. Determining what our goals should be for tackling a category or topic
  5. Distributing the curation efforts among curators

Categorical Schemes

Categories could be devised or arranged in a number of ways. Current or proposed approaches include:

  • Using WormBook chapters as a basis for categories
  • Pathways (Wnt, MAPK, TGF, Ras, etc.)
  • Processes (Aging, Sex Determination, Meiosis)
  • Phenotypes
  • Commonly referenced gene sets

Methods & Approaches for Paper Categorization

  • Collecting (manually) lists of relevant keywords, and performing (manual) Textpresso or PubMed searches
  • Running Textpresso scripts
    • SVM-based, supervised vs. unsupervised, keywords
    • Requires positives and negatives for training
  • Collecting common keywords from papers in out Author First Pass list of papers
    • Yuling has run scripts to determine word frequencies among these papers


Choosing a Curation Priority

Goals for a Curation Milestone

Distributing Curation Efforts