C.remanei

From WormBaseWiki
Revision as of 05:42, 29 February 2008 by Mh6 (talk | contribs) (→‎Curation)
Jump to navigationJump to search

Overview

current data

The C.remanei data available in WormBase is based on the Caenorhabditis_remanei-1.0 6x assembly from WashU and gene predictions using the WashU merged gene prediction pipeline. These gene predictions have not been submitted to EMBL/GenBank, but the assembly has.

Future Data

  • a newer 9.2x C.remanei assembly is available from WashU Caenorhabditis_remanei-15.0.1
  • the NGASP project plans produce a new C.remanei gene set
  • manual curation is supposed to start as soon as NGASP predictions for the new assembly are available.
  • EMBL/GenBank submission is supposed to start together with manual curation.

State of Integration

As of WS188:

ACeDB

The Genes, Proteins as well as orthology predictions were imported in WS186 are are now automatically updated each build. Blast homologies to C.remanei proteins are included for C.elegans and C.briggsae proteins.

known bugs

  • The protein sequences were not imported (fixed in WS188). Protein pages and proper cross links between them and their corresponding gene pages will be available in WS189.
  • some blast hits from C.elegans/C.briggsae don't link to the correct protein (fixed in WS188)

Website

The default code is used to show the ACeDB data and there is a GBrowse version available to show the GFF annotation. It is also included in the three-way genomic alignment viewer on the dev site.

known bugs

  • the live site can't show remanei gene pages (fixed on dev)
  • the small GBrowse window on the gene page doesn't show anything.
    • that is due to renaming of the C.remanei contigs to resolve conflicts with C.brenneri
    • an updated GFF file is planned for WS189 which will solve the GBrowse issues

Flat Files

GFF

A GFF file including CDSes and blast hits were created for WS186 and are available at remagff186 The GFF file is not yet complete and doesn't include BLAT data and Gene Spans. An updated version is planned for WS189.

Proteins

A test-version of C.remanei proteins, called remapep was done for WS186 and can be found at remapep186 The generation of history files and addition of all available IDs is planned for WS189.

DNA

The contigs used for annotation can be found at remagff186. The sequences should be identically to the WashU-1.0 assembly, only contig names were prefixed with "Crem_" .

Curation

  • gene models will be manually curated by WashU.
  • gene names are already curated in the traditional ex-CGC / WormGeneNames way (identifiable by a Crem- prefix)
  • for the current gene models only systematic errors will be fixed, as we expect NGASP to release a new geneset any day

History

2005

preliminary data was made available as flatfiles and through the WormBase GBrowse

WS186

integration of C.remanei data into the canonical ACeDB database

(future) WS189

  • integration of C.remanei into the regular build pipeline
  • website fixes

(future) WS190

  • should be in a state to get included into the frozen release