Difference between revisions of "WormBase Paper Categorization"
Line 420: | Line 420: | ||
Mitochondrial UPR: http://www.wikipathways.org/index.php/Pathway:WP525 | Mitochondrial UPR: http://www.wikipathways.org/index.php/Pathway:WP525 | ||
+ | |||
+ | |||
+ | |||
+ | === UPR GO Terms and Associated Genes === | ||
+ | |||
+ | GO type: Biological process | ||
+ | Term: UFP-specific transcription factor mRNA processing during unfolded protein response | ||
+ | GO ID: GO:0030969 | ||
+ | Gene Associations: No associations in WB | ||
+ | |||
+ | |||
+ | GO type: Biological process | ||
+ | Term: positive regulation of gene-specific transcription involved in unfolded protein response | ||
+ | GO ID: GO:0006990 | ||
+ | Gene Associations: pek-1 | ||
+ | |||
+ | |||
+ | GO type: Biological process | ||
+ | Term: activation of signaling protein activity involved in unfolded protein response | ||
+ | GO ID: GO:0006987 | ||
+ | Gene Associations: No associations in WB | ||
+ | |||
+ | |||
+ | GO type: Biological process | ||
+ | Term: endoplasmic reticulum unfolded protein response | ||
+ | GO ID: GO:0030968 | ||
+ | Gene Associations: abu-1, atfs-1, C14B9.2, cdc-48.1, cdc-48.2, crp-1, crt-1, cup-2, hsp-16.1, hsp-16.11, hsp-16.2, hsp-16.41, hsp-16.49, hsp-3, hsp-4, hsp-6, hsp-60, hsp-70, ire-1, pdi-2, R151.6, R151.7, rnf-121, sca-1, spg-7, T14G8.3, uggt-1, uggt-2, xbp-1 | ||
+ | |||
+ | |||
+ | GO type: Biological process | ||
+ | Term: mitochondrial unfolded protein response | ||
+ | GO ID: GO:0034514 | ||
+ | Gene Associations: atfs-1, haf-1, hsp-6, hsp-60, ubl-5 | ||
+ | |||
+ | |||
+ | GO type: Biological process | ||
+ | Term: response to unfolded protein | ||
+ | GO ID: GO:0006986 | ||
+ | Gene Associations: No associations in WB | ||
= Goals for a Curation Milestone = | = Goals for a Curation Milestone = |
Revision as of 18:14, 18 October 2013
Contents
- 1 Statement of Purpose
- 2 Categorical Schemes
- 2.1 Pilot Category Trees
- 2.1.1 WormBook Chapter categories
- 2.1.1.1 Genetics and Genomics
- 2.1.1.2 Developmental Control
- 2.1.1.3 Signal Transduction
- 2.1.1.4 Molecular Biology
- 2.1.1.5 Post-embryonic Development
- 2.1.1.6 Neurobiology and Behavior
- 2.1.1.7 Biochemistry
- 2.1.1.8 Sex Determination
- 2.1.1.9 Evolution and Ecology
- 2.1.1.10 Cell Biology
- 2.1.1.11 The Germ Line
- 2.1.1.12 Disease Models and Drug Discovery
- 2.1.2 Ad hoc category trees
- 2.1.3 Stress categories
- 2.1.1 WormBook Chapter categories
- 2.1 Pilot Category Trees
- 3 Methods & Approaches for Paper Categorization
- 4 Choosing a Curation Priority
- 5 Goals for a Curation Milestone
- 6 Distributing Curation Efforts
Statement of Purpose
In an attempt to improve curation efficiency, provide distinct curation milestones, and create a common curation goal for curators, we would like to classify the C. elegans corpus into biologically relevant categories. These categories could represent different biological processes, molecular functions, anatomy terms, disease relevance, signaling pathways, phenotypes or any other categorical distinction we see as appropriate as we investigate this organizational scheme. This Wiki page is intended to collect and organize our thoughts and proposals as to how to best organize the C. elegans corpus (or for nematodes in general), including but not limited to:
- What fundamental categories and/or hierarchies we can create to categorize papers
- Methods or approaches for assigning individual papers to these categories
- How we go about choosing a category as a curation priority
- Determining what our goals should be for tackling a category or topic
- Distributing the curation efforts among curators
Categorical Schemes
Categories could be devised or arranged in a number of ways. Current or proposed approaches include:
- Using WormBook chapters as a basis for categories
- Pathways (Wnt, MAPK, TGF, Ras, etc.)
- Processes (Aging, Sex Determination, Meiosis)
- Phenotypes
- Commonly referenced gene sets
- Topics commonly studied in C. elegans due to its advantages as a model system
Pilot Category Trees
WormBook Chapter categories
Genetics and Genomics
- Genetics
- Complementation
- Essential genes
- Gene duplications and genetic redundancy
- Genetic enhancers
- Genetic mosaics
- Genetic suppression
- Karyotype, ploidy, and gene dosage
- Natural variation and population genetics
- Genomics
- Network biology
- Noncoding RNA genes
- Genomic classification of protein-coding gene families
- Germline genomics
- Mitochondrial genetics
- Nematode genome evolution
- Gene structure
- Transposons
Developmental Control
- Asymmetric cell division and axis formation in the embryo
- The C. elegans intestine
- The C. elegans pharynx: a model for organogenesis
- Emrbyological variation during nematode development
- Epidermal morphogenesis
- Gastrulation in C. elegans
- Hermaphrodite cell-gate specification
- Notch signaling in the C. elegans embryo
- Programmed cell death
- Translational control of maternal RNAs
Signal Transduction
- Protein Kinases
- Canonical RTK-Ras-ERK signaling and related alternative pathways
- Eph receptor signaling
- Heterotrimeric G proteins
- (Homologs of the) Hh signaling network
- Notch signaling
- Nuclear hormone receptors
- Signaling in the immune response
- Small GTPases
- TGF-Beta signaling
- Putative chemoreceptor families (in C. elegans)
- Wnt signaling
Molecular Biology
- Micro RNAs
- Noncoding RNA genes
- DNA repair
- Gene expression changes associated with aging
- Genomic classification of protein-coding gene families
- Germline chromatin
- Mechanisms and regulation of translation
- Gene structure
- Pre-mRNA splicing and its regulation
- RNA-binding proteins
- Roles of chromatin factors in development
- Transcription mechanisms
- Transcriptional regulation of gene expression
- Translational control of maternal RNAs
- Trans-splicing and operons
- Transposons
- Ubiquitin-mediated pathways
- X-chromosome dosage compensation
Post-embryonic Development
- Dauer
- Evolution of development in nematodes related to C. elegans
- Gene expression changes associated with aging
- Hermaphrodite cell-gate specification
- Male development
- The measurement and analysis of age-related changes
- Morphogenesis of the vulva and the vulval-uterine connection
- Roles of chromatin factors in development
- Vulval development
Neurobiology and Behavior
- Behavior
- Feeding
- Egg-laying
- Male mating behavior
- Learning and memory
- Development
- Neurogenesis
- Synaptogenesis
- Glia
- Function
- Potassium channels
- Putative chemoreceptor families (in C. elegans)
- Sensory cilia
- Neurotransmitters
- Acetylcholine
- Biogenic amine neurotransmitters
- Ethanol
- GABA
- Ionotropic glutamate receptors: genetics, behavior, and electrophysiology
- Neuropeptides
- Sensory modalities
- Chemosensation
- Mechanosensation
Biochemistry
- Ascaroside signaling
- Carbohydrates and glycosylation
- Intermediary metabolism
- The eggshell in the C. elegans oocyte-to-embryo transition
- Mitochondrial Unfolded Protein Response (UPR)
- Model animals for the study of oxidative stress from complex II
- Reproduction, fat metabolism, and life span: what is the connection?
- A worm rich in protein: Quantitative, differential, and global proteomics
Sex Determination
- Hermaphrodite cell-gate specification
- Male development
- Morphogenesis of the vulva and the vulval-uterine connection
- Sex determination in the germ line
- Somatic sex determination
- The evolution of nematode sex determination
- Vulval development
- X-chromosome dosage compensation
Evolution and Ecology
- Nematode diversity and phylogeny
- Ecology of Caenorhabditis species
- Evolution of development in nematodes related to C. elegans
- The evolution of nematode sex determination
- Interactions with microbial pathogens
- Molecular evolution inferences from the C. elegans genome
- Natural variation and population genetics
- Nematode genome evolution
- The phylogenetic relationships of Caenorhabditis and other rhabditids
- Genomics and biology of the nematode Caenorhabditis briggsae
- The biology and genome of Heterorhabditis bacteriophora
- Oscheius tipulae
- Pristionchus pacificus
- Strongyloides spp.
- Biology and genome of Trichinella spiralis
Cell Biology
- Asymmetric cell division and axis formation in the embryo
- Autophagy
- Basement membranes
- The cadherin superfamily
- Carbohydrates and glycosylation
- Cell cycle regulation
- Cell division
- Cell fusions
- The cuticle
- Epidermal morphogenesis
- Epithelial junctions and attachments
- Gastrulation in C. elegans
- Intracellular trafficking
- Mitochondrial genetics
- Potassium channels
- Programmed cell death
- Sarcomere assembly in C. elegans muscle
- Sensory cilia
- Sperm motility and MSP
- Spermatogenesis
- Synaptogenesis
The Germ Line
- Control of oocyte meiotic maturation and fertilization
- Germline chromatin
- Germline genomics
- Germline proliferation and its control
- Germline survival and apoptosis
- Specification of the germ line
- Spermatogenesis
Disease Models and Drug Discovery
- C. elegans and volatile anesthetics
- Anthelmintic drugs
- Obesity and the regulation of fat metabolism
Ad hoc category trees
Biological Processes
- Development
- Development by stages
- Germline development
- Embryonic development
- Larval development
- Aging
- Development by tissue
- Ectoderm development
- Nervous system development
- Epithelial system development
- Mesoderm development
- Muscle development
- Endoderm development
- Intestine development
- Ectoderm development
- Development by stages
- Cell Biological Processes
- Cell Cycle
- Mitosis
- Meiosis
- Cell Cycle
- Behavior
- Chemotaxis
- Thermotaxis
- Touch response
- Mating
- Gene Expression
- Transcription and its regulation
- Post-transcriptional regulation
- Translation and its regulation
- Post-translational regulation
Pathways
- Wnt signaling
- TGF signaling
- Insulin-like signaling
- Ras signaling
Biological Entities
- Anatomy terms
- Genomic elements
- Genes
- Coding genes
- Genes by protein domain
- Genes by protein complex
- Non-coding genes
- Coding genes
- Regulatory elements
- Genes
- Cellular components
- Nucleus
- Plasma membrane
- Endosome
- Lysosome
- Endoplasmic reticulum
- Golgi apparatus
- Cytoskeleton
- Mitochondria
Stress categories
Stress
- Stress Types
- Chemotoxic/Xenobiotic stress
- Desiccation, Dehydration
- Electrophilic stress (lipoperoxidation, 4-hydroxynonenal (4-HNE))
- Genotoxicity, DNA damage, Replicative stress
- Glucose stress
- Heavy metals
- Injury/Trauma
- Ischemia
- Mechanical stress
- Shear stress
- Metabolic/metabolite stress
- Misfolded/Unfolded protein stress, Proteotoxicity, Endoplasmic reticulum (ER) stress, Mitochondrial (mit) stress
- Beta-amyloid aggregation
- Polyglutamine aggregation
- Mitochondrial stress
- Osmotic, Salt, Hypertonic stress
- Oxidative stress, Hypoxia/Anoxia, Hyperoxia
- Reactive Oxygen Species (ROS)
- Peroxides
- Pathogen stress (a whole field)
- Starvation
- Glucose, ATP depletion
- Thermal stress (heat/cold shock)
- UV, Gamma, X-Ray irradiation
- Stress Topics
- Hormesis
- Stress fibers
- Stress granules
- Stress response
- Mitochondrial stress response
- Stress tolerance/resistance
- Stress adaptation
- Thermotolerance
- Stress reporters
- Pgst-4::GFP
- Phsp-4::GFP
- Phsp-16.2::GFP
- Psod-3::GFP
Methods & Approaches for Paper Categorization
- Collecting (manually) lists of relevant keywords, and performing (manual) Textpresso or PubMed searches
- Running Textpresso scripts
- SVM-based, supervised vs. unsupervised, keywords
- Requires positives and negatives for training
- Chris is in the process of assembling all WormBook articles for an initial training round
- Collecting common keywords from papers in out Author First Pass list of papers
- Yuling has run scripts to determine word frequencies among these papers
Choosing a Curation Priority
Once we are satisfied with a categorization scheme(s), we may want to select a single category for curators to focus their efforts. Our criteria for choosing a category may depend on a number of factors:
- Number of papers in each data type backlog
- Number of papers in a category
- Distribution of data types (or required curator effort) for papers in a category
- Current representation of category topic in WormBase
- Highly represented: we may want to "polish off" what we have to generate a complete picture
- Lowly represented: May be low-hanging fruit for covering new topics
- Current representation of gene function for genes represented in category
- We could focus on genes with little or no known function
Pilot Topic: Unfolded Protein Response (UPR)
UPR Papers
List of WBPaper IDs for candidate UPR papers:
WBPaper00004394 WBPaper00005036 WBPaper00005044 WBPaper00005432 WBPaper00006537 WBPaper00013244 WBPaper00024210 WBPaper00024269 WBPaper00024285 WBPaper00025202 WBPaper00026830 WBPaper00026929 WBPaper00026939 WBPaper00027723 WBPaper00028479 WBPaper00030877 WBPaper00030999 WBPaper00031084 WBPaper00031456 WBPaper00031857 WBPaper00031951 WBPaper00031985 WBPaper00032003 WBPaper00032177 WBPaper00032192 WBPaper00032255 WBPaper00032443 WBPaper00032521 WBPaper00032977 WBPaper00033126 WBPaper00035154 WBPaper00035294 WBPaper00035302 WBPaper00035409 WBPaper00035580 WBPaper00035985 WBPaper00036008 WBPaper00036076 WBPaper00036115 WBPaper00036256 WBPaper00037064 WBPaper00037750 WBPaper00037991 WBPaper00038075 WBPaper00038219 WBPaper00038304 WBPaper00039930 WBPaper00039938 WBPaper00039975 WBPaper00040397 WBPaper00040453 WBPaper00040582 WBPaper00040692 WBPaper00040801 WBPaper00040821 WBPaper00041022 WBPaper00041065 WBPaper00041076 WBPaper00041159 WBPaper00041163 WBPaper00041212 WBPaper00041269 WBPaper00041295 WBPaper00041370 WBPaper00041512 WBPaper00041568 WBPaper00041851 WBPaper00041954 WBPaper00041961 WBPaper00042060 WBPaper00042074 WBPaper00042171 WBPaper00042217
List of PMIDs for papers that do not yet (as of entry) have a WBPaper ID:
PMID: 23870130
UPR WikiPathways
Here are WikiPathways currently under development to represent the various branches of the UPR pathway:
Endoplasmic Reticulum UPR: http://www.wikipathways.org/index.php/Pathway:WP2578
Mitochondrial UPR: http://www.wikipathways.org/index.php/Pathway:WP525
UPR GO Terms and Associated Genes
GO type: Biological process Term: UFP-specific transcription factor mRNA processing during unfolded protein response GO ID: GO:0030969 Gene Associations: No associations in WB
GO type: Biological process
Term: positive regulation of gene-specific transcription involved in unfolded protein response
GO ID: GO:0006990
Gene Associations: pek-1
GO type: Biological process
Term: activation of signaling protein activity involved in unfolded protein response
GO ID: GO:0006987
Gene Associations: No associations in WB
GO type: Biological process
Term: endoplasmic reticulum unfolded protein response
GO ID: GO:0030968
Gene Associations: abu-1, atfs-1, C14B9.2, cdc-48.1, cdc-48.2, crp-1, crt-1, cup-2, hsp-16.1, hsp-16.11, hsp-16.2, hsp-16.41, hsp-16.49, hsp-3, hsp-4, hsp-6, hsp-60, hsp-70, ire-1, pdi-2, R151.6, R151.7, rnf-121, sca-1, spg-7, T14G8.3, uggt-1, uggt-2, xbp-1
GO type: Biological process
Term: mitochondrial unfolded protein response
GO ID: GO:0034514
Gene Associations: atfs-1, haf-1, hsp-6, hsp-60, ubl-5
GO type: Biological process
Term: response to unfolded protein
GO ID: GO:0006986
Gene Associations: No associations in WB
Goals for a Curation Milestone
Some goals that have been discussed are:
- Completing curation backlog for all data types for a given category
- Completing curation of human disease-relevance for a given category
- Generating (or filling out) a WormBase Process page and WikiPathway
- Goals could be set for each curation upload (every ~two months)
- We can post the results of the milestone on the WormBase homepage/blog