Difference between revisions of "WormBase Paper Categorization"
Line 76: | Line 76: | ||
**Gene expression changes associated with aging | **Gene expression changes associated with aging | ||
**Genomic classification of protein-coding gene families | **Genomic classification of protein-coding gene families | ||
− | ** | + | **Germline chromatin |
− | ** | + | **Mechanisms and regulation of translation |
+ | **Gene structure | ||
+ | **Pre-mRNA splicing and its regulation | ||
+ | **RNA-binding proteins | ||
+ | **Roles of chromatin factors in development | ||
+ | **Transcription mechanisms | ||
+ | **Transcriptional regulation of gene expression | ||
+ | **Translational control of maternal RNAs | ||
+ | **Trans-splicing and operons | ||
+ | **Transposons | ||
+ | **Ubiquitin-mediated pathways | ||
+ | **X-chromosome dosage compensation | ||
*'''Post-embryonic Development''' | *'''Post-embryonic Development''' | ||
*'''Neurobiology and Behavior''' | *'''Neurobiology and Behavior''' |
Revision as of 19:45, 5 August 2013
Contents
Statement of Purpose
In an attempt to improve curation efficiency, provide distinct curation milestones, and create a common curation goal for curators, we would like to classify the C. elegans corpus into biologically relevant categories. These categories could represent different biological processes, molecular functions, anatomy terms, disease relevance, signaling pathways, phenotypes or any other categorical distinction we see as appropriate as we investigate this organizational scheme. This Wiki page is intended to collect and organize our thoughts and proposals as to how to best organize the C. elegans corpus (or for nematodes in general), including but not limited to:
- What fundamental categories and/or hierarchies we can create to categorize papers
- Methods or approaches for assigning individual papers to these categories
- How we go about choosing a category as a curation priority
- Determining what our goals should be for tackling a category or topic
- Distributing the curation efforts among curators
Categorical Schemes
Categories could be devised or arranged in a number of ways. Current or proposed approaches include:
- Using WormBook chapters as a basis for categories
- Pathways (Wnt, MAPK, TGF, Ras, etc.)
- Processes (Aging, Sex Determination, Meiosis)
- Phenotypes
- Commonly referenced gene sets
- Topics commonly studied in C. elegans due to its advantages as a model system
Pilot Category Tree
WormBook Chapter categories
- Genetics and Genomics
- Genetics
- Complementation
- Essential genes
- Gene duplications and genetic redundancy
- Genetic enhancers
- Genetic mosaics
- Genetic suppression
- Karyotype, ploidy, and gene dosage
- Natural variation and population genetics
- Genomics
- Network biology
- Noncoding RNA genes
- Genomic classification of protein-coding gene families
- Germline genomics
- Mitochondrial genetics
- Nematode genome evolution
- Gene structure
- Transposons
- Genetics
- Developmental Control
- Asymmetric cell division and axis formation in the embryo
- The C. elegans intestine
- The C. elegans pharynx: a model for organogenesis
- Emrbyological variation during nematode development
- Epidermal morphogenesis
- Gastrulation in C. elegans
- Hermaphrodite cell-gate specification
- Notch signaling in the C. elegans embryo
- Programmed cell death
- Translational control of maternal RNAs
- Signal Transduction
- Protein Kinases
- Canonical RTK-Ras-ERK signaling and related alternative pathways
- Eph receptor signaling
- Heterotrimeric G proteins
- (Homologs of the) Hh signaling network
- Notch signaling
- Nuclear hormone receptors
- Signaling in the immune response
- Small GTPases
- TGF-Beta signaling
- Putative chemoreceptor families (in C. elegans)
- Wnt signaling
- Molecular Biology
- Micro RNAs
- Noncoding RNA genes
- DNA repair
- Gene expression changes associated with aging
- Genomic classification of protein-coding gene families
- Germline chromatin
- Mechanisms and regulation of translation
- Gene structure
- Pre-mRNA splicing and its regulation
- RNA-binding proteins
- Roles of chromatin factors in development
- Transcription mechanisms
- Transcriptional regulation of gene expression
- Translational control of maternal RNAs
- Trans-splicing and operons
- Transposons
- Ubiquitin-mediated pathways
- X-chromosome dosage compensation
- Post-embryonic Development
- Neurobiology and Behavior
- Behavior
- Development
- Function
- Neurotransmitters
- Sensory modalities
- Biochemistry
- Sex Determination
- Evolution and Ecology
- Cell Biology
- The Germ Line
- Disease Models and Drug Discovery
Ad hoc category trees
Biological Processes
- Development
- Development by stages
- Germline development
- Embryonic development
- Larval development
- Aging
- Development by tissue
- Ectoderm development
- Nervous system development
- Epithelial system development
- Mesoderm development
- Muscle development
- Endoderm development
- Intestine development
- Ectoderm development
- Development by stages
- Cell Biological Processes
- Cell Cycle
- Mitosis
- Meiosis
- Cell Cycle
- Behavior
- Chemotaxis
- Thermotaxis
- Touch response
- Mating
- Gene Expression
- Transcription and its regulation
- Post-transcriptional regulation
- Translation and its regulation
- Post-translational regulation
Pathways
- Wnt signaling
- TGF signaling
- Insulin-like signaling
- Ras signaling
Biological Entities
- Anatomy terms
- Genomic elements
- Genes
- Coding genes
- Genes by protein domain
- Genes by protein complex
- Non-coding genes
- Coding genes
- Regulatory elements
- Genes
- Cellular components
- Nucleus
- Plasma membrane
- Endosome
- Lysosome
- Endoplasmic reticulum
- Golgi apparatus
- Cytoskeleton
- Mitochondria
Methods & Approaches for Paper Categorization
- Collecting (manually) lists of relevant keywords, and performing (manual) Textpresso or PubMed searches
- Running Textpresso scripts
- SVM-based, supervised vs. unsupervised, keywords
- Requires positives and negatives for training
- Chris is in the process of assembling all WormBook articles for an initial training round
- Collecting common keywords from papers in out Author First Pass list of papers
- Yuling has run scripts to determine word frequencies among these papers
Choosing a Curation Priority
Once we are satisfied with a categorization scheme(s), we may want to select a single category for curators to focus their efforts. Our criteria for choosing a category may depend on a number of factors:
- Number of papers in each data type backlog
- Number of papers in a category
- Distribution of data types (or required curator effort) for papers in a category
- Current representation of category topic in WormBase
- Highly represented: we may want to "polish off" what we have to generate a complete picture
- Lowly represented: May be low-hanging fruit for covering new topics
- Current representation of gene function for genes represented in category
- We could focus on genes with little or no known function
Goals for a Curation Milestone
Some goals that have been discussed are:
- Completing curation backlog for all data types for a given category
- Completing curation of human disease-relevance for a given category
- Generating (or filling out) a WormBase Process page and WikiPathway
- Goals could be set for each curation upload (every ~two months)
- We can post the results of the milestone on the WormBase homepage/blog