Difference between revisions of "SPELL"

From WormBaseWiki
Jump to navigationJump to search
 
Line 1: Line 1:
 
'''User's Guide For Wormbase SPELL'''
 
'''User's Guide For Wormbase SPELL'''
  
SPELL allows for search of all the microarray data in WormBase. SPELL software is developed and maintained by Mattew Hibbs in Princeton University.  
+
WormBase SPELL allows for search of microarray, RNA-Seq, tiling array and proteomics data for multiple speices of nematodes.
  
  
Line 14: Line 14:
  
 
ACS: ACS (Adjusted Correlation Score)
 
ACS: ACS (Adjusted Correlation Score)
This is a measure of weighted correlation of a gene to the query set. These calculations are taken over z-scored correlation values, so these numbers can be interpreted as the number of standard deviations away from the average background correlation.  
+
This is a measure of the weighted correlation of a gene to the query set. These calculations are taken over z-scored correlation values, so these numbers can be interpreted as the number of standard deviations away from the average background correlation.  
  
GO Term Enrichment: This table shows terms from the gene ontology that are significantly enriched in the list of genes displayed as the result of your search. Significance is calculated using the hyper geometric distribution, and Bonferroni correction is applied to the p-values reported. Note that both query genes and the genes returned by the search are included in this calculation.  
+
GO Term Enrichment: This table shows terms from the gene ontology that are significantly enriched in the list of genes displayed as the result of your search. The significance is calculated using the hypergeometric distribution, and Bonferroni correction is applied to the p-values reported. Note that both query genes and the genes returned by the search are included in this calculation.  
  
The GO term enrichment is performed over the query genes and all of the genes returned by the search.  We're measuring GO term enrichment using a hyper geometric distribution that can assign a p-value to the likelihood of observing this many genes that are annotated to a particular GO term.  The P-val column contains the Bonferroni corrected p-value resulting from this test, the % query column lists the number of genes in the list of genes displayed that are annotated to that term, and the % genome column has the number of genes in the entire genome annotated to that term.
+
The GO term enrichment is performed over the query genes and all of the genes returned by the search.  We're measuring GO term enrichment using a hypergeometric distribution that can assign a p-value to the likelihood of observing these many genes that are annotated to a particular GO term.  The P-val column contains the Bonferroni corrected p-value resulting from this test, the % query column lists the number of genes in the list of genes displayed that are annotated to that term, and the % genome column has the number of genes in the entire genome annotated to that term.
  
  
 
II. Dataset Listing
 
II. Dataset Listing
  
This section list all the microarray papers and datasets included in WormBase SPELL. Abstracts of these papers can be viewed.
+
This section list all datasets included in WormBase SPELL. Abstracts of these papers can be viewed. Users can turn on the "Options for Filtering Results by Dataset Tags" to filter the results based on their experiment type, species, tissue specificity, and biological processes.
  
  
Line 33: Line 33:
 
IV. About the website
 
IV. About the website
  
This section explain Browser compatibility, Algorithm details and Implementation details of SPELL.  
+
This section explains Browser compatibility, Algorithm details and Implementation details of SPELL.  
  
  

Latest revision as of 18:49, 22 May 2018

User's Guide For Wormbase SPELL

WormBase SPELL allows for search of microarray, RNA-Seq, tiling array and proteomics data for multiple speices of nematodes.


I. New Search

You may enter one or multiple gene names, more than two genes will be the best. SPELL will evaluate the correlation of query genes in each dataset, then display the most relevant datasets and the most relevant genes.

Rank: Correlation of query genes were checked in each dataset and ranked.

Contribution: This is the percentage of total correlation within the query set captured by each individual dataset. These percentages are the weights used when searching for additional genes related to the query set. If a single gene was used as a query, all datasets are weighted equally, and "--" is displayed.

ACS: ACS (Adjusted Correlation Score) This is a measure of the weighted correlation of a gene to the query set. These calculations are taken over z-scored correlation values, so these numbers can be interpreted as the number of standard deviations away from the average background correlation.

GO Term Enrichment: This table shows terms from the gene ontology that are significantly enriched in the list of genes displayed as the result of your search. The significance is calculated using the hypergeometric distribution, and Bonferroni correction is applied to the p-values reported. Note that both query genes and the genes returned by the search are included in this calculation.

The GO term enrichment is performed over the query genes and all of the genes returned by the search. We're measuring GO term enrichment using a hypergeometric distribution that can assign a p-value to the likelihood of observing these many genes that are annotated to a particular GO term. The P-val column contains the Bonferroni corrected p-value resulting from this test, the % query column lists the number of genes in the list of genes displayed that are annotated to that term, and the % genome column has the number of genes in the entire genome annotated to that term.


II. Dataset Listing

This section list all datasets included in WormBase SPELL. Abstracts of these papers can be viewed. Users can turn on the "Options for Filtering Results by Dataset Tags" to filter the results based on their experiment type, species, tissue specificity, and biological processes.


III. Show Expression Levels

By entering one or multiple gene names, expression levels of query gene in each microarray dataset will be displayed as a color coded bar. Red indicates high expression, green indicates low expression. By clicking the bar, you will be able to read the actual numbers of expression level at each condition.


IV. About the website

This section explains Browser compatibility, Algorithm details and Implementation details of SPELL.




SPELL FAQ


1. What is SPELL?

SPELL stands for "Serial Patterns of Expression Levels Locator". It is a web-based, context-sensitive search engine for microarray data.

SPELL interface allows a researcher to provide a list of query genes, then the search engine reports which datasets are most relevant to that query, lists additional genes related to the query within the relevant conditions and displays the expression levels of these genes. Links to extra information about each dataset, the original publications, and gene information are also provided.


2. What is the back-end database of SPELL?

SPELL is run on top of a mySQL database including four tables: "datasets", "exprs", "genes", "users".

"datasets" table describes the characteristics of each microarray dataset, including paper reference information, number and names of microarray conditions ... etc.

"exprs" table describes the expression level of each gene in each dataset. The expression levels are curated as a string of numbers separated with comma.

"genes" table matches SPELL gene ID to WormBase gene ID.

"users" table only contains SPELL user admin information.


3. How does SPELL work?

The main algorithm of the SPELL search engine is described in the paper: Hibbs MA, et al. Exploring the functional landscape of gene expression: Directed search of large microarray compendium. Bioinformatics, 2007.

Briefly, given a set of query genes provided the engine, a relevance weight is determined for each dataset based on how well correlated the query genes are in each dataset. Datasets where the query genes are largely co-expressed receive a high weight, while datasets where the query genes do not agree are given a low weight. Based on these per-dataset weights, weighted correlations are calculated for every other gene in the genome to the query set. In this way genes that agree with the query set in datasets where the query is consistent will achieve the best results, while genes that agree with some of the query set in datasets where the query is not co-expressed will receive a low result. Datasets and genes are sorted by their correlation scores and returned for each query.

There are two edge cases not discussed in the main text that you may encounter:

First, if no datasets are found to contain a significant signal for a given query then we are unable to assign per-dataset weights for the search. In this case a warning message is displayed, and all datasets are equally weighted for that query.

Second, if no genes are related to the query set at a reliable confidence level, then a warning message is displayed and the confidence level is weakened until results can be obtained.

Both of these cases typically only occur when the query genes are either largely unrelated, or highly unique. Neither of these cases occurs very often.


4. How can I install a local SPELL mirror?

The SPELL search engine interface was built using the Ruby on Rails framework, with a Java back-end to perform the searches. AJAX elements were created using the Prototype, JQuery, Thickbox, and JTip libraries.

For details on installing a local WormBase SPELL mirror, please contact with help @ wormbase.org.