Difference between revisions of "WormBase Literature Curation Workflow"
From WormBaseWiki
Jump to navigationJump to search(65 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | =WormBase Data Types | + | =WormBase Curated Data Types and Curation Methods= |
+ | |||
+ | ''Please note that this page may be periodically updated as both data types curated, and curation methods used, change and improve.'' | ||
+ | |||
+ | '''Abbreviations Used:''' | ||
+ | |||
+ | CGC - Caenorhabditis Genetics Center | ||
+ | |||
+ | ChEBI - Chemical Entities of Biological Interest | ||
+ | |||
+ | SMID-DB - Small Molecule Identifier DataBase | ||
+ | |||
+ | GEO - Gene Expression Omnibus | ||
+ | |||
+ | GO - Gene Ontology | ||
+ | |||
+ | HMM - Hidden Markov Model | ||
+ | |||
+ | PFM - Position Frequency Matrix | ||
+ | |||
+ | PWM - Position Weight Matrix | ||
+ | |||
+ | SO - Sequence Ontology | ||
+ | |||
+ | SVM - Support Vector Machine | ||
+ | |||
+ | WAO - Worm Anatomy Ontology | ||
+ | |||
+ | WPO - Worm Phenotype Ontology | ||
+ | |||
{| {{table border ="1"}} | {| {{table border ="1"}} | ||
| align="center" style="background:#f0f0f0;"|'''Data Types Curated from the Literature''' | | align="center" style="background:#f0f0f0;"|'''Data Types Curated from the Literature''' | ||
| align="center" style="background:#f0f0f0;"|'''Data Type Description''' | | align="center" style="background:#f0f0f0;"|'''Data Type Description''' | ||
| align="center" style="background:#f0f0f0;"|'''Paper Flagging Methods (triage)''' | | align="center" style="background:#f0f0f0;"|'''Paper Flagging Methods (triage)''' | ||
− | | align="center" style="background:#f0f0f0;"|'''Extraction Methods''' | + | | align="center" style="background:#f0f0f0;"|'''Data Extraction Methods''' |
| align="center" style="background:#f0f0f0;"|'''Entities Involved''' | | align="center" style="background:#f0f0f0;"|'''Entities Involved''' | ||
− | | align="center" style="background:#f0f0f0;"|'''Ontologies | + | | align="center" style="background:#f0f0f0;"|'''Ontologies or Controlled Vocabularies Used''' |
|- | |- | ||
− | | | + | |'''Genes and Genetics''' |
|- | |- | ||
− | |Genes | + | |Genes studied||Genes for which experimental results or sequence-related analyses are reported are linked to the publication||n/a - applies to all publications||Perl script (abstract only), Manual curation: Author, Curator||Genes, CGC names, Sequence names, Other names (synonyms)|| |
+ | |- | ||
+ | |Genes cloned||Molecular characterization of a locus||SVM (variation sequence change), Manual curation: Author||Manual curation||WormBase Gene IDs, CGC (Caenorhabditis Genetics Center) names, Sequence names, Other names (synonyms)|| | ||
+ | |- | ||
+ | |Variations: Allele||Mutations identified in forward or reverse genetic screens ||Textpresso (cats or script?)||Textpresso (cats or script?)|| Variations, Genes, CDS's, Transcripts|| | ||
+ | |- | ||
+ | |Strains||Laboratory and natural isolates of nematode strains||Manual curation: Author, Curator||Manual curation: Curator||Gene, Variation|| | ||
+ | |- | ||
+ | |Genetic mapping||2- and 3-factor mapping data, chromosomal deficiency breakpoints||||Manual curation||Gene, Variation|| Are any deficiencies annotated with phenotypes? '''Yes, as rearrangements''' | ||
+ | |- | ||
+ | |Chromosomal rearrangements||Nomenclature, genetic boundaries||||Manual curation||Gene, Strain, Variation||WPO | ||
+ | |- | ||
+ | |Transgenes||Genomic constructs used as reporters to mark tissues and subcellular structures||Textpresso (regular expression script)||Manual curation||Gene|| | ||
+ | |- | ||
+ | |''C. elegans'' human disease gene homologs||Identification of ''C. elegans'' genes homologous to human genes associated with disease, gene model tag and incorporated into free text descriptions||Textpresso: category searches||Manual curation: curator||Gene, Protein, Disease||MeSH (?), Disease Ontology (?) | ||
+ | |- | ||
+ | |'''Gene Function''' | ||
+ | |- | ||
+ | |Phenotype analysis||Annotation of phenotypes resulting from genomic perturbations or natural variation||SVM, Author||Manual curation||Variations, Rearrangements, Transgenes, Natural Variants||WAO, WPO, Life Stage Ontology, Molecule Controlled Vocabulary | ||
+ | |- | ||
+ | |Molecules (e.g. chemicals, drugs, small molecules)||Curation of molecules used to study behavior, physiology, gene function, etc.||Manual curation - author, curator||Manual curation - curator||Molecules, Variation, Strain, Transgene, RNAi, Rearrangement||WPO, MeSH, ChEBI, SMID-DB, CasRN | ||
+ | |- | ||
+ | |RNAi experiments||Annotation of sequences used for and phenotype resulting from RNA interference experiments||SVM||Manual curation - Curator (Textpresso cats?)||Sequence, Gene||WPO | ||
+ | |- | ||
+ | |Time of action||Is this part of phenotype curation? '''this is not part of phenotype curation'''|| | ||
+ | |- | ||
+ | |Gene Ontology (GO): Biological Process||Annotation of gene products to GO biological process terms based upon mutant phenotypes, in vitro assays||SVM, Manual Curation - Author, Curator||Manual Curation - Curator, Phenotype2GO Pipeline||Genes, Variations, RNAi Experiments||GO, WAO, WPO | ||
+ | |- | ||
+ | |Gene Ontology (GO): Molecular Function||Annotation of gene product to GO molecular function terms based upon in vitro assays, mutant phenotypes||SVM, HMM, Manual Curation: Author, Curator||Semi-automated manual curation: HMM, Textpresso category searches, curator||Genes, Variations, RNAi Experiments||GO, ChEBI | ||
+ | |- | ||
+ | |Genetic interactions||Annotation of phenotype and type of interaction (e.g. suppression, enhancement) as a result of two or more genetic perturbations||SVM, other phenotype-related flags (e.g. RNAi, otherexpr)||Textpresso categories, Manual curation: Curator||Genes, Variations, RNAi experiments||Gene, WPO, Genetic Interaction Ontology (in progress) | ||
+ | |- | ||
+ | |Functional complementation||Phenotypic rescue of mutant Gene A as a result of Gene B expression||Manual curation: Author, Curator||Manual Curation: Author||Gene, Transgene, Variation, WPO | ||
+ | |- | ||
+ | |Gene product interactions (physical interactions)||Gene product interactions with other gene products (protein, nucleic acids), some overlap with GO MF curation||SVM, Manual curation: Author||Semi-automated manual curation: Textpresso category searches, curator||Genes, Proteins, Sequences||GO, BioGRID Experimental Systems Vocabulary | ||
+ | |- | ||
+ | |Concise descriptions||Free text descriptions that summarize key biological information about a gene||SVM, Textpresso category searches, Manual curation: author, curator||Textpresso category searches, Manual curation: curator||Genes, OMIM Database Identifiers|| | ||
+ | |- | ||
+ | |'''Gene Expression''' | ||
+ | |- | ||
+ | |Antibodies -''C. elegans''||Non-commercial antibodies, generated against ''C. elegans'' antigens||Textpresso (script), SVM||Manual curation||Gene, Laboratory|| | ||
+ | |- | ||
+ | |Gene expression pattern||Temporal and/or spatial expression patterns for genes, transcripts, proteins||SVM, Manual curation: author, curator||Manual curation: Curator||Genes, Transcripts, Proteins, Antibodies, Images||GO, Life stage, WAO | ||
+ | |- | ||
+ | |Expression pattern images||Images of expression patterns from published papers, laboratories||Manual curation: curator||Image extraction: Textpresso, perl scripts. Manual curation: curator||Genes, Anatomy Terms, Subcellular Localization||GO, WAO | ||
+ | |- | ||
+ | |Gene Ontology (GO): Cellular Component||Curation of the subcellular localization of gene products||SVM and Textpresso category searches||Textpresso category searches||Genes, GO terms||GO | ||
+ | |- | ||
+ | |Gene regulation||Annotation of changes in gene expression (levels, temporal or spatial pattern, localization) upon genetic or environmental perturbation||SVM||Manual curation: curator||Genes, Proteins, Antibodies, Expression Patterns, Molecules||?? | ||
+ | |- | ||
+ | |Regulatory features||Curation of nucleic acid sequences that regulate gene expression||SVM||Manual curation: curator||Genes, Sequences, ??||SO (?) | ||
+ | |- | ||
+ | |Cis-regulatory sites||Verified or predicted cis-regulatory sites as defined by PFM or PWM||Manual curation: author, curator||Manual curation: curator||Sequences, ??||SO(?) | ||
+ | |- | ||
+ | |Microarray data||Microarray data are imported from GEO and Array Express, mapped to ''C. elegans'' genomic sequence||Manual curation: Author, Curator||Manual curation: curator||Genes, Sequences, PCR products, ?||? | ||
+ | |- | ||
+ | |'''Protein structure and function''' | ||
+ | |- | ||
+ | |Protein analysis in vitro||Curation of protein function in vitro, e.g. enzymatic and transporter activities, overlaps with GO MF curation||HMMs, Manual curation: author, curator||HMMs, Manual curation: curator||Genes, Proteins, Molecules (Column 16?)||GO, ChEBI | ||
+ | |- | ||
+ | |Mass spectrometry data||Mass spectrometry data used for curating gene models||SVM||Manual curation: curator||Genes, Proteins, Sequences||SO (?) | ||
+ | |- | ||
+ | |'''Gene models, sequence changes''' | ||
+ | |- | ||
+ | ||Gene structure corrections||Curation of genome sequence changes, alternative splice sites, poly(A) sites, etc.||SVM||Manual curation: curator||Genes, Sequences||SO (?) | ||
+ | |- | ||
+ | |Allele sequence||Curation of sequences associated with allelic variations||SVM, Textpresso(cats or scripts?)||Manual curation: curator||Genes, Sequences, Variations, PCR products (?) | ||
+ | |- | ||
+ | |SNP sequence||Curation of new and existing single nucleotide polymorphisms||Manual curation: author||Manual curation: curator||Gene, Sequence, Strain|| | ||
+ | |- | ||
+ | |'''Cell function''' | ||
+ | |- | ||
+ | |Cell/anatomy terms||Curation of new anatomy terms or synonyms||Textpresso, Manual curation: author, curator||Manual curation: curator|| ||WAO | ||
+ | |- | ||
+ | |Ablation data||Curation of the phenotype(s) resulting from laser ablation of cells or tissues||Textpresso, Manual curation: author||Manual curation: curator||Anatomy terms, Phenotype terms||WAO, WPO | ||
+ | |- | ||
+ | |Tissue or cell site-of-action||Curation of tissue or cell in which a gene products acts by tissue- or cell type-specific expression||Textpresso, Manual curation: author||Manual curation: curator||Genes, Anatomy terms||WAO | ||
+ | |- | ||
+ | |Mosaic analysis||Curation of tissue or cell in which a gene product acts by genetic mosaicism||Textpresso, Manual curation: author||Manual curation: curator||Genes, Anatomy terms||WAO | ||
|- | |- | ||
|} | |} |
Latest revision as of 18:30, 29 August 2012
WormBase Curated Data Types and Curation Methods
Please note that this page may be periodically updated as both data types curated, and curation methods used, change and improve.
Abbreviations Used:
CGC - Caenorhabditis Genetics Center
ChEBI - Chemical Entities of Biological Interest
SMID-DB - Small Molecule Identifier DataBase
GEO - Gene Expression Omnibus
GO - Gene Ontology
HMM - Hidden Markov Model
PFM - Position Frequency Matrix
PWM - Position Weight Matrix
SO - Sequence Ontology
SVM - Support Vector Machine
WAO - Worm Anatomy Ontology
WPO - Worm Phenotype Ontology
Data Types Curated from the Literature | Data Type Description | Paper Flagging Methods (triage) | Data Extraction Methods | Entities Involved | Ontologies or Controlled Vocabularies Used |
Genes and Genetics | |||||
Genes studied | Genes for which experimental results or sequence-related analyses are reported are linked to the publication | n/a - applies to all publications | Perl script (abstract only), Manual curation: Author, Curator | Genes, CGC names, Sequence names, Other names (synonyms) | |
Genes cloned | Molecular characterization of a locus | SVM (variation sequence change), Manual curation: Author | Manual curation | WormBase Gene IDs, CGC (Caenorhabditis Genetics Center) names, Sequence names, Other names (synonyms) | |
Variations: Allele | Mutations identified in forward or reverse genetic screens | Textpresso (cats or script?) | Textpresso (cats or script?) | Variations, Genes, CDS's, Transcripts | |
Strains | Laboratory and natural isolates of nematode strains | Manual curation: Author, Curator | Manual curation: Curator | Gene, Variation | |
Genetic mapping | 2- and 3-factor mapping data, chromosomal deficiency breakpoints | Manual curation | Gene, Variation | Are any deficiencies annotated with phenotypes? Yes, as rearrangements | |
Chromosomal rearrangements | Nomenclature, genetic boundaries | Manual curation | Gene, Strain, Variation | WPO | |
Transgenes | Genomic constructs used as reporters to mark tissues and subcellular structures | Textpresso (regular expression script) | Manual curation | Gene | |
C. elegans human disease gene homologs | Identification of C. elegans genes homologous to human genes associated with disease, gene model tag and incorporated into free text descriptions | Textpresso: category searches | Manual curation: curator | Gene, Protein, Disease | MeSH (?), Disease Ontology (?) |
Gene Function | |||||
Phenotype analysis | Annotation of phenotypes resulting from genomic perturbations or natural variation | SVM, Author | Manual curation | Variations, Rearrangements, Transgenes, Natural Variants | WAO, WPO, Life Stage Ontology, Molecule Controlled Vocabulary |
Molecules (e.g. chemicals, drugs, small molecules) | Curation of molecules used to study behavior, physiology, gene function, etc. | Manual curation - author, curator | Manual curation - curator | Molecules, Variation, Strain, Transgene, RNAi, Rearrangement | WPO, MeSH, ChEBI, SMID-DB, CasRN |
RNAi experiments | Annotation of sequences used for and phenotype resulting from RNA interference experiments | SVM | Manual curation - Curator (Textpresso cats?) | Sequence, Gene | WPO |
Time of action | Is this part of phenotype curation? this is not part of phenotype curation | ||||
Gene Ontology (GO): Biological Process | Annotation of gene products to GO biological process terms based upon mutant phenotypes, in vitro assays | SVM, Manual Curation - Author, Curator | Manual Curation - Curator, Phenotype2GO Pipeline | Genes, Variations, RNAi Experiments | GO, WAO, WPO |
Gene Ontology (GO): Molecular Function | Annotation of gene product to GO molecular function terms based upon in vitro assays, mutant phenotypes | SVM, HMM, Manual Curation: Author, Curator | Semi-automated manual curation: HMM, Textpresso category searches, curator | Genes, Variations, RNAi Experiments | GO, ChEBI |
Genetic interactions | Annotation of phenotype and type of interaction (e.g. suppression, enhancement) as a result of two or more genetic perturbations | SVM, other phenotype-related flags (e.g. RNAi, otherexpr) | Textpresso categories, Manual curation: Curator | Genes, Variations, RNAi experiments | Gene, WPO, Genetic Interaction Ontology (in progress) |
Functional complementation | Phenotypic rescue of mutant Gene A as a result of Gene B expression | Manual curation: Author, Curator | Manual Curation: Author | Gene, Transgene, Variation, WPO | |
Gene product interactions (physical interactions) | Gene product interactions with other gene products (protein, nucleic acids), some overlap with GO MF curation | SVM, Manual curation: Author | Semi-automated manual curation: Textpresso category searches, curator | Genes, Proteins, Sequences | GO, BioGRID Experimental Systems Vocabulary |
Concise descriptions | Free text descriptions that summarize key biological information about a gene | SVM, Textpresso category searches, Manual curation: author, curator | Textpresso category searches, Manual curation: curator | Genes, OMIM Database Identifiers | |
Gene Expression | |||||
Antibodies -C. elegans | Non-commercial antibodies, generated against C. elegans antigens | Textpresso (script), SVM | Manual curation | Gene, Laboratory | |
Gene expression pattern | Temporal and/or spatial expression patterns for genes, transcripts, proteins | SVM, Manual curation: author, curator | Manual curation: Curator | Genes, Transcripts, Proteins, Antibodies, Images | GO, Life stage, WAO |
Expression pattern images | Images of expression patterns from published papers, laboratories | Manual curation: curator | Image extraction: Textpresso, perl scripts. Manual curation: curator | Genes, Anatomy Terms, Subcellular Localization | GO, WAO |
Gene Ontology (GO): Cellular Component | Curation of the subcellular localization of gene products | SVM and Textpresso category searches | Textpresso category searches | Genes, GO terms | GO |
Gene regulation | Annotation of changes in gene expression (levels, temporal or spatial pattern, localization) upon genetic or environmental perturbation | SVM | Manual curation: curator | Genes, Proteins, Antibodies, Expression Patterns, Molecules | ?? |
Regulatory features | Curation of nucleic acid sequences that regulate gene expression | SVM | Manual curation: curator | Genes, Sequences, ?? | SO (?) |
Cis-regulatory sites | Verified or predicted cis-regulatory sites as defined by PFM or PWM | Manual curation: author, curator | Manual curation: curator | Sequences, ?? | SO(?) |
Microarray data | Microarray data are imported from GEO and Array Express, mapped to C. elegans genomic sequence | Manual curation: Author, Curator | Manual curation: curator | Genes, Sequences, PCR products, ? | ? |
Protein structure and function | |||||
Protein analysis in vitro | Curation of protein function in vitro, e.g. enzymatic and transporter activities, overlaps with GO MF curation | HMMs, Manual curation: author, curator | HMMs, Manual curation: curator | Genes, Proteins, Molecules (Column 16?) | GO, ChEBI |
Mass spectrometry data | Mass spectrometry data used for curating gene models | SVM | Manual curation: curator | Genes, Proteins, Sequences | SO (?) |
Gene models, sequence changes | |||||
Gene structure corrections | Curation of genome sequence changes, alternative splice sites, poly(A) sites, etc. | SVM | Manual curation: curator | Genes, Sequences | SO (?) |
Allele sequence | Curation of sequences associated with allelic variations | SVM, Textpresso(cats or scripts?) | Manual curation: curator | Genes, Sequences, Variations, PCR products (?) | |
SNP sequence | Curation of new and existing single nucleotide polymorphisms | Manual curation: author | Manual curation: curator | Gene, Sequence, Strain | |
Cell function | |||||
Cell/anatomy terms | Curation of new anatomy terms or synonyms | Textpresso, Manual curation: author, curator | Manual curation: curator | WAO | |
Ablation data | Curation of the phenotype(s) resulting from laser ablation of cells or tissues | Textpresso, Manual curation: author | Manual curation: curator | Anatomy terms, Phenotype terms | WAO, WPO |
Tissue or cell site-of-action | Curation of tissue or cell in which a gene products acts by tissue- or cell type-specific expression | Textpresso, Manual curation: author | Manual curation: curator | Genes, Anatomy terms | WAO |
Mosaic analysis | Curation of tissue or cell in which a gene product acts by genetic mosaicism | Textpresso, Manual curation: author | Manual curation: curator | Genes, Anatomy terms | WAO |