Difference between revisions of "UserGuide:SimpleMine"

From WormBaseWiki
Jump to navigationJump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== SimpleMine Users' Guide ==
 
== SimpleMine Users' Guide ==
  
SimpleMine is designed for biologists who want to get essential information for a list of genes without any command-line or programming skill. We consider the following as "essential information" based on user feedback. Please feel free to contact us if you want to include more information in the list.   
+
SimpleMine is designed for biologists who want to get essential information for a list of genes without any command-line or programming skill. We consider the following as "essential information" based on user feedback. Please feel free to contact us if you want to include more information on the list.   
 +
 
 +
Input: Users can submit CGC names, sequence names, WormBase Gene IDs, WormPep IDs, UniProt IDs, TreeFam IDs, and RefSeq IDs. 
 +
Output: Users can opt for HTML display or download a tab-delimited file. One row per gene, each cell contains one data field. In each cell, information is divided according to the following tier: comma-separated, bar-separated, semicolon-separated. Please see the following explanations for each data field about how information is organized for a particular type of data.   
 +
 
  
 
'''Names, Identifiers, Sequences, Species'''
 
'''Names, Identifiers, Sequences, Species'''
  
WormBase Gene ID
+
WormBase Gene ID: Unique Gene identifiers used by WormBase
  
Public Name
+
Public Name: Official gene names specified by WormBase. A public name can be a CGC name or a sequence name.
  
Species
+
Species: Each gene can only be associated with one species.
  
Sequence Name
+
Sequence Name: Sequence name of the gene.
  
Other Name
+
Other Name: All names that have been used by the gene in publications.
  
Transcript
+
Transcript: Transcript names of the gene.
  
Operon
+
Operon: A set of genes transcribed under the control of an operator gene.
  
WormPep
+
WormPep: Protein IDs used by WormBase.
  
Protein Domain
+
Protein Domain: Protein Domains associated with the gene. If a gene has multiple peptides, peptides are comma-separated. Each peptide entry lists all predicted protein domains bar-separated.
  
UniProt
+
UniProt: Official Protein Identifiers used by the UniProt database.
  
Reference UniProt ID
+
Reference UniProt ID: There has been an attempt to assign a single, unique UniProt ID for each gene to act as a "reference" UniProt ID for the gene. This is called the Gene-Centric Reference Proteome (GCRP) ID and is usually an existing SwissProt or TrEMBL ID that has been selected to represent the gene.
  
TreeFam
+
TreeFam: Official gene identifiers used by the TreeFam database.
  
RefSeq_mRNA
+
RefSeq_mRNA: Sequence IDs used by the RefSeq database
  
RefSeq_protein
+
RefSeq_protein: Protein IDs used by the RefSeq database
  
  
 
'''Genetics, Phenotypes, Interactions'''
 
'''Genetics, Phenotypes, Interactions'''
  
Genetic Map Position
+
Genetic Map Position: Display Chromosome and chromosomal position of the gene.
  
RNAi Phenotype Observed
+
RNAi Phenotype Observed: Display the phenotype ontology names sorted in case-insensitive alphabetical order.
  
Allele Phenotype Observed
+
Allele Phenotype Observed: Display the phenotype ontology names sorted in case-insensitive alphabetical order.
  
Coding_exon Non_silent Allele: Among those alleles that were sequenced, we exclude polymorphisms and alleles that fall in an intron, 5' UTR or 3' UTR.
+
Coding_exon Non_silent Allele: Display a list of alleles that fall in any coding exon. Polymorphisms and silent alleles are excluded. Alleles are sorted and displayed according to the following order of their molecular changes: Deletion, Insertion, Substitution, Tandem_duplication. In each molecular change category, alleles are sorted in alphabetical order. Each allele entry contains three bar-separated fields: allele name, molecular change, and protein effect.  
  
Interacting Gene: We only display experimentally confirmed gene interactions.  
+
Interacting Gene: We only display experimentally confirmed gene interactions (Physical, Genetic, and Regulatory). The genes are displayed in the following order: genes with all three types of interactions detected, genes with two out of three types of interactions detected, genes with one type of interaction detected. In each category, gene names are sorted in case-insensitive alphabetical order. Each gene entry contains two bar-separated fields: gene name and interaction types. The interaction types are separated with semicolons.  
  
  
 
'''Expression'''
 
'''Expression'''
  
Expr_pattern Tissue: Anatomical expression based on GFP, immunoprecipitation, In_situ, etc.
+
Expr_pattern Tissue: Anatomical expression based on GFP, immunoprecipitation, In_situ, etc. Anatomy names are displayed in case-insensitive alphabetical order.
  
Genomic Study Tissue: Tissue enrichment based on microarray, RNA-Seq, and proteomics studies.  
+
Genomic Study Tissue: Tissue enrichment based on the microarray, RNA-Seq, and proteomics studies. Anatomy names are displayed in case-insensitive alphabetical order.
  
Expr_pattern LifeStage: Developmental expression based on GFP, immunoprecipitation, In_situ, etc.
+
Expr_pattern LifeStage: Developmental expression based on GFP, immunoprecipitation, In_situ, etc. Life stages are displayed in case-insensitive alphabetical order.
  
Genomic Study LifeStage: Developmental expression based on microarray, RNA-Seq, and proteomics studies.  
+
Genomic Study LifeStage: Developmental expression based on the microarray, RNA-Seq, and proteomics studies. Life stages are displayed in case-insensitive alphabetical order.  
  
  
 
'''Human Orthologs and Disease'''
 
'''Human Orthologs and Disease'''
  
Disease Info
+
Disease Info: Display the disease names associated with the gene. Each disease entry contains two bar-separated fields: disease name and the evidence (By Orthology or By Experiment). 
  
Human Ortholog
+
Human Ortholog: Display the human orthologs of the gene. Each ortholog entry contains two bar-separated fields: ortholog name and algorithms that predicted the orthology. The algorithms are separated with semicolons.
  
  
 
'''Functional Annotation and References'''
 
'''Functional Annotation and References'''
  
Gene Ontology Association: Gene Ontology terms that were annotated to the gene.
+
Gene Ontology Association: Display the names of gene ontology terms that were annotated to the gene, sorted in alphabetic order.
  
 
Concise Description: Outdated manually written descriptions of the gene functions.
 
Concise Description: Outdated manually written descriptions of the gene functions.
Line 73: Line 77:
 
Automated Description: Up-to-date gene description machine generated based on the current WormBase data.
 
Automated Description: Up-to-date gene description machine generated based on the current WormBase data.
  
Expression Cluster Summary: Gene regulation, molecular regulation, and tissue enrichment summary based on microarray, RNA-Seq, and proteomics studies.  
+
Expression Cluster Summary: Gene regulation, molecular regulation, and tissue enrichment summary based on the microarray, RNA-Seq, and proteomics studies.  
  
 
Reference: Primary research articles that studied the gene.
 
Reference: Primary research articles that studied the gene.

Latest revision as of 19:29, 12 August 2020

SimpleMine Users' Guide

SimpleMine is designed for biologists who want to get essential information for a list of genes without any command-line or programming skill. We consider the following as "essential information" based on user feedback. Please feel free to contact us if you want to include more information on the list.

Input: Users can submit CGC names, sequence names, WormBase Gene IDs, WormPep IDs, UniProt IDs, TreeFam IDs, and RefSeq IDs. Output: Users can opt for HTML display or download a tab-delimited file. One row per gene, each cell contains one data field. In each cell, information is divided according to the following tier: comma-separated, bar-separated, semicolon-separated. Please see the following explanations for each data field about how information is organized for a particular type of data.


Names, Identifiers, Sequences, Species

WormBase Gene ID: Unique Gene identifiers used by WormBase

Public Name: Official gene names specified by WormBase. A public name can be a CGC name or a sequence name.

Species: Each gene can only be associated with one species.

Sequence Name: Sequence name of the gene.

Other Name: All names that have been used by the gene in publications.

Transcript: Transcript names of the gene.

Operon: A set of genes transcribed under the control of an operator gene.

WormPep: Protein IDs used by WormBase.

Protein Domain: Protein Domains associated with the gene. If a gene has multiple peptides, peptides are comma-separated. Each peptide entry lists all predicted protein domains bar-separated.

UniProt: Official Protein Identifiers used by the UniProt database.

Reference UniProt ID: There has been an attempt to assign a single, unique UniProt ID for each gene to act as a "reference" UniProt ID for the gene. This is called the Gene-Centric Reference Proteome (GCRP) ID and is usually an existing SwissProt or TrEMBL ID that has been selected to represent the gene.

TreeFam: Official gene identifiers used by the TreeFam database.

RefSeq_mRNA: Sequence IDs used by the RefSeq database

RefSeq_protein: Protein IDs used by the RefSeq database


Genetics, Phenotypes, Interactions

Genetic Map Position: Display Chromosome and chromosomal position of the gene.

RNAi Phenotype Observed: Display the phenotype ontology names sorted in case-insensitive alphabetical order.

Allele Phenotype Observed: Display the phenotype ontology names sorted in case-insensitive alphabetical order.

Coding_exon Non_silent Allele: Display a list of alleles that fall in any coding exon. Polymorphisms and silent alleles are excluded. Alleles are sorted and displayed according to the following order of their molecular changes: Deletion, Insertion, Substitution, Tandem_duplication. In each molecular change category, alleles are sorted in alphabetical order. Each allele entry contains three bar-separated fields: allele name, molecular change, and protein effect.

Interacting Gene: We only display experimentally confirmed gene interactions (Physical, Genetic, and Regulatory). The genes are displayed in the following order: genes with all three types of interactions detected, genes with two out of three types of interactions detected, genes with one type of interaction detected. In each category, gene names are sorted in case-insensitive alphabetical order. Each gene entry contains two bar-separated fields: gene name and interaction types. The interaction types are separated with semicolons.


Expression

Expr_pattern Tissue: Anatomical expression based on GFP, immunoprecipitation, In_situ, etc. Anatomy names are displayed in case-insensitive alphabetical order.

Genomic Study Tissue: Tissue enrichment based on the microarray, RNA-Seq, and proteomics studies. Anatomy names are displayed in case-insensitive alphabetical order.

Expr_pattern LifeStage: Developmental expression based on GFP, immunoprecipitation, In_situ, etc. Life stages are displayed in case-insensitive alphabetical order.

Genomic Study LifeStage: Developmental expression based on the microarray, RNA-Seq, and proteomics studies. Life stages are displayed in case-insensitive alphabetical order.


Human Orthologs and Disease

Disease Info: Display the disease names associated with the gene. Each disease entry contains two bar-separated fields: disease name and the evidence (By Orthology or By Experiment).

Human Ortholog: Display the human orthologs of the gene. Each ortholog entry contains two bar-separated fields: ortholog name and algorithms that predicted the orthology. The algorithms are separated with semicolons.


Functional Annotation and References

Gene Ontology Association: Display the names of gene ontology terms that were annotated to the gene, sorted in alphabetic order.

Concise Description: Outdated manually written descriptions of the gene functions.

Automated Description: Up-to-date gene description machine generated based on the current WormBase data.

Expression Cluster Summary: Gene regulation, molecular regulation, and tissue enrichment summary based on the microarray, RNA-Seq, and proteomics studies.

Reference: Primary research articles that studied the gene.