Difference between revisions of "WormBase Literature Curation Workflow"

From WormBaseWiki
Jump to navigationJump to search
 
(74 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''Summary of WormBase Data Types Curated and Curation Methods'''
+
=WormBase Curated Data Types and Curation Methods=
 +
 
 +
''Please note that this page may be periodically updated as both data types curated, and curation methods used, change and improve.''
 +
 
 +
'''Abbreviations Used:'''
 +
 
 +
CGC - Caenorhabditis Genetics Center
 +
 
 +
ChEBI - Chemical Entities of Biological Interest
 +
 
 +
SMID-DB - Small Molecule Identifier DataBase
 +
 
 +
GEO - Gene Expression Omnibus
 +
 
 +
GO - Gene Ontology
 +
 
 +
HMM - Hidden Markov Model
 +
 
 +
PFM - Position Frequency Matrix
 +
 
 +
PWM - Position Weight Matrix
 +
 
 +
SO - Sequence Ontology
 +
 
 +
SVM - Support Vector Machine
 +
 
 +
WAO - Worm Anatomy Ontology
 +
 
 +
WPO - Worm Phenotype Ontology
  
=WormBase Data Types Curated and Curation Methods=
 
 
{| {{table border ="1"}}
 
{| {{table border ="1"}}
 
| align="center" style="background:#f0f0f0;"|'''Data Types Curated from the Literature'''
 
| align="center" style="background:#f0f0f0;"|'''Data Types Curated from the Literature'''
 
| align="center" style="background:#f0f0f0;"|'''Data Type Description'''
 
| align="center" style="background:#f0f0f0;"|'''Data Type Description'''
 
| align="center" style="background:#f0f0f0;"|'''Paper Flagging Methods (triage)'''
 
| align="center" style="background:#f0f0f0;"|'''Paper Flagging Methods (triage)'''
| align="center" style="background:#f0f0f0;"|'''Fact Extraction Methods'''
+
| align="center" style="background:#f0f0f0;"|'''Data Extraction Methods'''
 
| align="center" style="background:#f0f0f0;"|'''Entities Involved'''
 
| align="center" style="background:#f0f0f0;"|'''Entities Involved'''
 
| align="center" style="background:#f0f0f0;"|'''Ontologies or Controlled Vocabularies Used'''
 
| align="center" style="background:#f0f0f0;"|'''Ontologies or Controlled Vocabularies Used'''
 +
|-
 +
|'''Genes and Genetics'''
 +
|-
 +
|Genes studied||Genes for which experimental results or sequence-related analyses are reported are linked to the publication||n/a - applies to all publications||Perl script (abstract only), Manual curation: Author, Curator||Genes, CGC names, Sequence names, Other names (synonyms)||
 +
|-
 +
|Genes cloned||Molecular characterization of a locus||SVM (variation sequence change), Manual curation: Author||Manual curation||WormBase Gene IDs, CGC (Caenorhabditis Genetics Center) names, Sequence names, Other names (synonyms)||
 +
|-
 +
|Variations: Allele||Mutations identified in forward or reverse genetic screens ||Textpresso (cats or script?)||Textpresso (cats or script?)|| Variations, Genes, CDS's, Transcripts||
 +
|-
 +
|Strains||Laboratory and natural isolates of nematode strains||Manual curation: Author, Curator||Manual curation: Curator||Gene, Variation||
 +
|-
 +
|Genetic mapping||2- and 3-factor mapping data, chromosomal deficiency breakpoints||||Manual curation||Gene, Variation|| Are any deficiencies annotated with phenotypes? '''Yes, as rearrangements'''
 +
|-
 +
|Chromosomal rearrangements||Nomenclature, genetic boundaries||||Manual curation||Gene, Strain, Variation||WPO
 +
|-
 +
|Transgenes||Genomic constructs used as reporters to mark tissues and subcellular structures||Textpresso (regular expression script)||Manual curation||Gene||
 +
|-
 +
|''C. elegans'' human disease gene homologs||Identification of ''C. elegans'' genes homologous to human genes associated with disease, gene model tag and incorporated into free text descriptions||Textpresso: category searches||Manual curation: curator||Gene, Protein, Disease||MeSH (?), Disease Ontology (?)
 +
|-
 +
|'''Gene Function'''
 +
|-
 +
|Phenotype analysis||Annotation of phenotypes resulting from genomic perturbations or natural variation||SVM, Author||Manual curation||Variations, Rearrangements, Transgenes, Natural Variants||WAO, WPO, Life Stage Ontology, Molecule Controlled Vocabulary
 +
|-
 +
|Molecules (e.g. chemicals, drugs, small molecules)||Curation of molecules used to study behavior, physiology, gene function, etc.||Manual curation - author, curator||Manual curation - curator||Molecules, Variation, Strain, Transgene, RNAi, Rearrangement||WPO, MeSH, ChEBI, SMID-DB, CasRN
 +
|-
 +
|RNAi experiments||Annotation of sequences used for and phenotype resulting from RNA interference experiments||SVM||Manual curation - Curator (Textpresso cats?)||Sequence, Gene||WPO
 +
|-
 +
|Time of action||Is this part of phenotype curation? '''this is not part of phenotype curation'''||
 +
|-
 +
|Gene Ontology (GO): Biological Process||Annotation of gene products to GO biological process terms based upon mutant phenotypes, in vitro assays||SVM, Manual Curation - Author, Curator||Manual Curation - Curator, Phenotype2GO Pipeline||Genes, Variations, RNAi Experiments||GO, WAO, WPO
 +
|-
 +
|Gene Ontology (GO): Molecular Function||Annotation of gene product to GO molecular function terms based upon in vitro assays, mutant phenotypes||SVM, HMM, Manual Curation: Author, Curator||Semi-automated manual curation: HMM, Textpresso category searches, curator||Genes, Variations, RNAi Experiments||GO, ChEBI
 +
|-
 +
|Genetic interactions||Annotation of phenotype and type of interaction (e.g. suppression, enhancement) as a result of two or more genetic perturbations||SVM, other phenotype-related flags (e.g. RNAi, otherexpr)||Textpresso categories, Manual curation: Curator||Genes, Variations, RNAi experiments||Gene, WPO, Genetic Interaction Ontology (in progress)
 +
|-
 +
|Functional complementation||Phenotypic rescue of mutant Gene A as a result of Gene B expression||Manual curation: Author, Curator||Manual Curation: Author||Gene, Transgene, Variation, WPO
 +
|-
 +
|Gene product interactions (physical interactions)||Gene product interactions with other gene products (protein, nucleic acids), some overlap with GO MF curation||SVM, Manual curation: Author||Semi-automated manual curation: Textpresso category searches, curator||Genes, Proteins, Sequences||GO, BioGRID Experimental Systems Vocabulary
 +
|-
 +
|Concise descriptions||Free text descriptions that summarize key biological information about a gene||SVM, Textpresso category searches, Manual curation: author, curator||Textpresso category searches, Manual curation: curator||Genes, OMIM Database Identifiers||
 +
|-
 +
|'''Gene Expression'''
 +
|-
 +
|Antibodies -''C. elegans''||Non-commercial antibodies, generated against ''C. elegans'' antigens||Textpresso (script), SVM||Manual curation||Gene, Laboratory||
 +
|-
 +
|Gene expression pattern||Temporal and/or spatial expression patterns for genes, transcripts, proteins||SVM, Manual curation: author, curator||Manual curation: Curator||Genes, Transcripts, Proteins, Antibodies, Images||GO, Life stage, WAO
 +
|-
 +
|Expression pattern images||Images of expression patterns from published papers, laboratories||Manual curation: curator||Image extraction: Textpresso, perl scripts. Manual curation: curator||Genes, Anatomy Terms, Subcellular Localization||GO, WAO
 +
|-
 +
|Gene Ontology (GO): Cellular Component||Curation of the subcellular localization of gene products||SVM and Textpresso category searches||Textpresso category searches||Genes, GO terms||GO
 +
|-
 +
|Gene regulation||Annotation of changes in gene expression (levels, temporal or spatial pattern, localization) upon genetic or environmental perturbation||SVM||Manual curation: curator||Genes, Proteins, Antibodies, Expression Patterns, Molecules||??
 +
|-
 +
|Regulatory features||Curation of nucleic acid sequences that regulate gene expression||SVM||Manual curation: curator||Genes, Sequences, ??||SO (?)
 +
|-
 +
|Cis-regulatory sites||Verified or predicted cis-regulatory sites as defined by PFM or PWM||Manual curation: author, curator||Manual curation: curator||Sequences, ??||SO(?)
 +
|-
 +
|Microarray data||Microarray data are imported from GEO and Array Express, mapped to ''C. elegans'' genomic sequence||Manual curation: Author, Curator||Manual curation: curator||Genes, Sequences, PCR products, ?||?
 +
|-
 +
|'''Protein structure and function'''
 +
|-
 +
|Protein analysis in vitro||Curation of protein function in vitro, e.g. enzymatic and transporter activities, overlaps with GO MF curation||HMMs, Manual curation: author, curator||HMMs, Manual curation: curator||Genes, Proteins, Molecules (Column 16?)||GO, ChEBI
 +
|-
 +
|Mass spectrometry data||Mass spectrometry data used for curating gene models||SVM||Manual curation: curator||Genes, Proteins, Sequences||SO (?)
 +
|-
 +
|'''Gene models, sequence changes'''
 +
|-
 +
||Gene structure corrections||Curation of genome sequence changes, alternative splice sites, poly(A) sites, etc.||SVM||Manual curation: curator||Genes, Sequences||SO (?)
 +
|-
 +
|Allele sequence||Curation of sequences associated with allelic variations||SVM, Textpresso(cats or scripts?)||Manual curation: curator||Genes, Sequences, Variations, PCR products (?)
 +
|-
 +
|SNP sequence||Curation of new and existing single nucleotide polymorphisms||Manual curation: author||Manual curation: curator||Gene, Sequence, Strain||
 +
|-
 +
|'''Cell function'''
 +
|-
 +
|Cell/anatomy terms||Curation of new anatomy terms or synonyms||Textpresso, Manual curation: author, curator||Manual curation: curator|| ||WAO
 +
|-
 +
|Ablation data||Curation of the phenotype(s) resulting from laser ablation of cells or tissues||Textpresso, Manual curation: author||Manual curation: curator||Anatomy terms, Phenotype terms||WAO, WPO
 +
|-
 +
|Tissue or cell site-of-action||Curation of tissue or cell in which a gene products acts by tissue- or cell type-specific expression||Textpresso, Manual curation: author||Manual curation: curator||Genes, Anatomy terms||WAO
 +
|-
 +
|Mosaic analysis||Curation of tissue or cell in which a gene product acts by genetic mosaicism||Textpresso, Manual curation: author||Manual curation: curator||Genes, Anatomy terms||WAO
 
|-
 
|-
 
|}
 
|}

Latest revision as of 18:30, 29 August 2012

WormBase Curated Data Types and Curation Methods

Please note that this page may be periodically updated as both data types curated, and curation methods used, change and improve.

Abbreviations Used:

CGC - Caenorhabditis Genetics Center

ChEBI - Chemical Entities of Biological Interest

SMID-DB - Small Molecule Identifier DataBase

GEO - Gene Expression Omnibus

GO - Gene Ontology

HMM - Hidden Markov Model

PFM - Position Frequency Matrix

PWM - Position Weight Matrix

SO - Sequence Ontology

SVM - Support Vector Machine

WAO - Worm Anatomy Ontology

WPO - Worm Phenotype Ontology

Data Types Curated from the Literature Data Type Description Paper Flagging Methods (triage) Data Extraction Methods Entities Involved Ontologies or Controlled Vocabularies Used
Genes and Genetics
Genes studied Genes for which experimental results or sequence-related analyses are reported are linked to the publication n/a - applies to all publications Perl script (abstract only), Manual curation: Author, Curator Genes, CGC names, Sequence names, Other names (synonyms)
Genes cloned Molecular characterization of a locus SVM (variation sequence change), Manual curation: Author Manual curation WormBase Gene IDs, CGC (Caenorhabditis Genetics Center) names, Sequence names, Other names (synonyms)
Variations: Allele Mutations identified in forward or reverse genetic screens Textpresso (cats or script?) Textpresso (cats or script?) Variations, Genes, CDS's, Transcripts
Strains Laboratory and natural isolates of nematode strains Manual curation: Author, Curator Manual curation: Curator Gene, Variation
Genetic mapping 2- and 3-factor mapping data, chromosomal deficiency breakpoints Manual curation Gene, Variation Are any deficiencies annotated with phenotypes? Yes, as rearrangements
Chromosomal rearrangements Nomenclature, genetic boundaries Manual curation Gene, Strain, Variation WPO
Transgenes Genomic constructs used as reporters to mark tissues and subcellular structures Textpresso (regular expression script) Manual curation Gene
C. elegans human disease gene homologs Identification of C. elegans genes homologous to human genes associated with disease, gene model tag and incorporated into free text descriptions Textpresso: category searches Manual curation: curator Gene, Protein, Disease MeSH (?), Disease Ontology (?)
Gene Function
Phenotype analysis Annotation of phenotypes resulting from genomic perturbations or natural variation SVM, Author Manual curation Variations, Rearrangements, Transgenes, Natural Variants WAO, WPO, Life Stage Ontology, Molecule Controlled Vocabulary
Molecules (e.g. chemicals, drugs, small molecules) Curation of molecules used to study behavior, physiology, gene function, etc. Manual curation - author, curator Manual curation - curator Molecules, Variation, Strain, Transgene, RNAi, Rearrangement WPO, MeSH, ChEBI, SMID-DB, CasRN
RNAi experiments Annotation of sequences used for and phenotype resulting from RNA interference experiments SVM Manual curation - Curator (Textpresso cats?) Sequence, Gene WPO
Time of action Is this part of phenotype curation? this is not part of phenotype curation
Gene Ontology (GO): Biological Process Annotation of gene products to GO biological process terms based upon mutant phenotypes, in vitro assays SVM, Manual Curation - Author, Curator Manual Curation - Curator, Phenotype2GO Pipeline Genes, Variations, RNAi Experiments GO, WAO, WPO
Gene Ontology (GO): Molecular Function Annotation of gene product to GO molecular function terms based upon in vitro assays, mutant phenotypes SVM, HMM, Manual Curation: Author, Curator Semi-automated manual curation: HMM, Textpresso category searches, curator Genes, Variations, RNAi Experiments GO, ChEBI
Genetic interactions Annotation of phenotype and type of interaction (e.g. suppression, enhancement) as a result of two or more genetic perturbations SVM, other phenotype-related flags (e.g. RNAi, otherexpr) Textpresso categories, Manual curation: Curator Genes, Variations, RNAi experiments Gene, WPO, Genetic Interaction Ontology (in progress)
Functional complementation Phenotypic rescue of mutant Gene A as a result of Gene B expression Manual curation: Author, Curator Manual Curation: Author Gene, Transgene, Variation, WPO
Gene product interactions (physical interactions) Gene product interactions with other gene products (protein, nucleic acids), some overlap with GO MF curation SVM, Manual curation: Author Semi-automated manual curation: Textpresso category searches, curator Genes, Proteins, Sequences GO, BioGRID Experimental Systems Vocabulary
Concise descriptions Free text descriptions that summarize key biological information about a gene SVM, Textpresso category searches, Manual curation: author, curator Textpresso category searches, Manual curation: curator Genes, OMIM Database Identifiers
Gene Expression
Antibodies -C. elegans Non-commercial antibodies, generated against C. elegans antigens Textpresso (script), SVM Manual curation Gene, Laboratory
Gene expression pattern Temporal and/or spatial expression patterns for genes, transcripts, proteins SVM, Manual curation: author, curator Manual curation: Curator Genes, Transcripts, Proteins, Antibodies, Images GO, Life stage, WAO
Expression pattern images Images of expression patterns from published papers, laboratories Manual curation: curator Image extraction: Textpresso, perl scripts. Manual curation: curator Genes, Anatomy Terms, Subcellular Localization GO, WAO
Gene Ontology (GO): Cellular Component Curation of the subcellular localization of gene products SVM and Textpresso category searches Textpresso category searches Genes, GO terms GO
Gene regulation Annotation of changes in gene expression (levels, temporal or spatial pattern, localization) upon genetic or environmental perturbation SVM Manual curation: curator Genes, Proteins, Antibodies, Expression Patterns, Molecules ??
Regulatory features Curation of nucleic acid sequences that regulate gene expression SVM Manual curation: curator Genes, Sequences, ?? SO (?)
Cis-regulatory sites Verified or predicted cis-regulatory sites as defined by PFM or PWM Manual curation: author, curator Manual curation: curator Sequences, ?? SO(?)
Microarray data Microarray data are imported from GEO and Array Express, mapped to C. elegans genomic sequence Manual curation: Author, Curator Manual curation: curator Genes, Sequences, PCR products, ? ?
Protein structure and function
Protein analysis in vitro Curation of protein function in vitro, e.g. enzymatic and transporter activities, overlaps with GO MF curation HMMs, Manual curation: author, curator HMMs, Manual curation: curator Genes, Proteins, Molecules (Column 16?) GO, ChEBI
Mass spectrometry data Mass spectrometry data used for curating gene models SVM Manual curation: curator Genes, Proteins, Sequences SO (?)
Gene models, sequence changes
Gene structure corrections Curation of genome sequence changes, alternative splice sites, poly(A) sites, etc. SVM Manual curation: curator Genes, Sequences SO (?)
Allele sequence Curation of sequences associated with allelic variations SVM, Textpresso(cats or scripts?) Manual curation: curator Genes, Sequences, Variations, PCR products (?)
SNP sequence Curation of new and existing single nucleotide polymorphisms Manual curation: author Manual curation: curator Gene, Sequence, Strain
Cell function
Cell/anatomy terms Curation of new anatomy terms or synonyms Textpresso, Manual curation: author, curator Manual curation: curator WAO
Ablation data Curation of the phenotype(s) resulting from laser ablation of cells or tissues Textpresso, Manual curation: author Manual curation: curator Anatomy terms, Phenotype terms WAO, WPO
Tissue or cell site-of-action Curation of tissue or cell in which a gene products acts by tissue- or cell type-specific expression Textpresso, Manual curation: author Manual curation: curator Genes, Anatomy terms WAO
Mosaic analysis Curation of tissue or cell in which a gene product acts by genetic mosaicism Textpresso, Manual curation: author Manual curation: curator Genes, Anatomy terms WAO