Orthology, Homology and Paralog data in WormBase

From WormBaseWiki
Jump to navigationJump to search

1. Orthology, Homology and Paralog data

  • Ace tags: ?Gene Ortholog_other, Paralog
  • Contact: Michael Paulini

From Michael Paulini:

a.) orthology
it is in the ACeDB database on the genes as ortholog/paralog, but we also dump it since a while each build (as   example, here: ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/species/c_elegans/PRJNA13758/annotation 

b.) homology
1. protein homology)
the blastx data is in the GFF files, as well as here: ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/acedb                 /Non_C_elegans_BLAST
for C.elegans the patch file is also loaded during the build, so you can find them as regular Homology_data on the respective  parent sequences in ACeDB.

the blastp data is as Homology_data on the proteins, as well as partially dumped into that one: ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS243.best_blastp_hits.txt.gz

We also got protein clusters (mostly eggNOG based ones), which show homology and shared function and are connected to the member proteins in ACeDB through the Homology_group tag .

2. nucleotide homology) 
Mostlyy based on blat, but with the current release switched to star
You can find them in the respective GFF files and also similar to the blastx as homology data on the parent sequences in ACeDB

We also got RNASeq, which currently lives as RNASeq features in the GFF and ACeDB, but also as expression level data in the Gene/Transcript/CDS.

And last, but not least, we got pairwise whole genome alignments for selected species, which currently we only show on EnsEMBL Genomes, but you can use the generic Compara API to pull the alignments from there.

As orthology + homology covers such a huge swath of very different data in WormBase, there is no unifying format, except ACeDB and to a certain extent GFF.

Back to Generation of automated descriptions