WormBaseComparativeGenomicsTools

From WormBaseWiki
Revision as of 17:31, 27 June 2009 by Mh6 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Genomic Alignments

Methods

WABA

pairwise whole genome alignments using [UCSC WABA]. Tha data is available from the Gene pages (link is called syntenic alignment). Or in the ACeDB database as Homol_data on the C.elegans and C.briggsae chromosomes.

PECAN

Multiple Genome Aligner [EBI PECAN] used in the [EnsEMBL Compara pipeline to produce multiple alignments. PECAN replaced MLAGAN in the EnsEMBL-compara pipeline between WS173 and WS174.

Viewers

  • preview of the GBrowse based new synteny viewer [dev.wormbase.org]
  • There is also a demo site showing the multiple alignments ([PECAN test]). The test-website allows to search for C.elegans sequence names and shows C.elegans/C.briggsae/C.remanei/C.brenneri alignments in the region overlapping the gene (exons are highlighted).

Raw Data

  • chromosome GFF files from the releases files contain WABA entries
  • ACeDB contains WABA homol_data on the chromosomes
  • EnsEMBL mySQL dumps are available on request from the Sanger FTP server [WS176 based dumps] containing 4 ensembl-core database + 2 ensembl-compara databases (1 compara-multiplealignment / 1 compara-ortholog).

Phylogenetics

Imported Data

We started to improve the cooperation with partner databases, so Protein IDs should be in sync with the respective WormBase releases used by the groups.

TreeFam

TreeFam data is updated during the regular build from the [treefam.org] database. It is also viewable on the gene pages as picture of a phylogenetic tree.

Inparanoid

Updated when the [Inparanoid] database is updated. 3 different clusters are shown on the gene pages (nematode / metazoa / rest).

KOGs

[KOGs] are clusters of eukaryotic orthology groups and are updated when the source database is. The clusters show predicted orthologues in fully seqeunced eukayotes.

OMA

[OMA] The orthologs matrix project (OMA) is based on a massive cross-comparison of complete genomes that is currently performed by the CBRG group of the ETH Zurich.

WormBase Data

Compara Orthologs

Orthologue relationships of genes represented in WormBase (C.elegans / C. briggsae / C.remanei / C.brenneri) are predicted during the regular build and included in the database (C.remanei rothologs are shown as Ortholog_others and C.brenneri orthologs are masked at the moment). Orthologue_others (viewable in the Treeview of the Gene pages) contain orthologues to genes which are not (yet) included in the ACeDB database. The prediction of the nematode orthologs is based on conservation of gene order and homology in syntenic regions. But we might update to using the genetree code from EnsEMBL-compara for higher specificity depending on how much sensitivity we loose.

Curated Orthologs

Ortholog relationships published and/or submitted are shown with their supporting evidence (papers / authors / persons) in addition to the predictions. They are also viewable on the Gene pages if available.

EnsEMBL gene trees

A tree based method to determine ortholoy/paralogy relationships of proteins. Currently being tested by WormBase and being used by the main EnsEMBL releases.

pitfalls

orthologs

  • different programs predict sometimes different orthologs for a gene. To make sure you pick the most probable ortholog it makes sense to view ALL available data and make your decision based on that. You can also include available phenotypes for RNAi/Knockouts as well as expression profiles in your decision.
  • some genes have duplications leading to one to many or many to many relationsships.
  • especially predicted gene models from newly sequenced organisms (briggsae/remanei/brenneri) are not always 100% correctly predicted leading to unclear predictions. If you see a case, submit your comments to WormBase (link at the bottom of the pages), so it can be fixed as fast as possible.

data export

  • try WormMart for data mining (the Orthologs are in Filter: Homologs / Orthologs)
    • Inparanoid, KOGs and Comara can be accessed this way
  • BioMart contains EnsEMBL orthologs based on WS170 if you prefer to use BioMart to ACeDB
  • use flatfiles or the ACeDB downloads for quicker local analysis
  • WS180 will contain additional orthology data from L. Hilier and E. Schwarz as well as OMIM.