WS107
From WormBaseWiki
Jump to navigationJump to search
Contents
Release Letter
New release of WormBase WS107, Wormpep107 and Wormrna107 Fri Aug 8 13:44:00 BST 2003 WS107 was built by Paul Davis ====================================================================== This directory includes: i) database.WS107.*.tar.gz - compressed data for new release ii) models.wrm.WS107 - the latest database schema (also in above database files) iii) CHROMOSOMES/subdir - contains 3 files (DNA, GFF & AGP per chromosome) iv) WS107-WS106.dbcomp - log file reporting difference from last release v) wormpep107.tar.gz - full Wormpep distribution corresponding to WS107 vi) wormrna107.tar.gz - latest WormRNA release containing non-coding RNA's in the genome vii) confirmed_genes.WS107.gz - DNA sequences of all genes confirmed by EST &/or cDNA viii) yk2orf.WS107.gz - Latest set of ORF connections to each Yuji Kohara EST clone ix) gene_interpolated_map_positions.WS107.gz - Interpolated map positions for each coding/RNA gene x) clone_interpolated_map_positions.WS107.gz - Interpolated map positions for each clone xi) best_blastp_hits.WS107.gz - for each C. elegans WormPep protein, lists Best blastp match to human, fly, yeast, C. briggsae, and SwissProt & Trembl proteins. Release notes on the web: ------------------------- http://www.sanger.ac.uk/Projects/C_elegans/WORMBASE Primary databases used in build WS107 ------------------------------------ brigdb : 2003-07-25 - updated camace : 2003-07-28 - updated citace : 2003-07-27 - updated cshace : 2003-07-22 - updated genace : 2003-07-28 - updated stlace : 2003-07-25 - updated Genome sequence composition: ---------------------------- WS107 WS106 change ---------------------------------------------- a 32367166 32367166 +0 c 17780238 17780238 +0 g 17757588 17757588 +0 t 32368415 32368415 +0 n 95 95 +0 - 0 0 +0 Total 100273502 100273502 +0 Wormpep data set: ---------------------------- There are 19916 CDS in autoace, 22091 when counting 2175 alternate splice forms. The 22091 sequences contain base pairs in total. Modified entries 169 Deleted entries 44 New entries 409 Reappeared entries 15 Net change +380 Status of entries: Confidence level of prediction ------------------------------------------------- Confirmed 4501 (20.4%) Partially_confirmed 8731 (39.5%) Predicted 8859 (40.1%) Status of entries: Protein Accessions ------------------------------------- Swissprot accessions 2388 (10.8%) TrEMBL accessions 18305 (82.9%) TrEMBLnew accessions 1394 (6.3%) Status of entries: Protein_ID's in EMBL --------------------------------------- Protein_id 22087 (100.0%) Locus <-> Sequence connections (cgc-approved) --------------------------------------------- Entries with locus connection 4500 GeneModel correction progress WS106 -> WS107 ----------------------------------------- Confirmed introns not is a CDS gene model; +---------+--------+ | Introns | Change | +---------+--------+ Cambridge | 1129 | -58 | St Louis | 935 | -215 | +---------+--------+ Members of known repeat families that overlap predicted exons; +---------+--------+ | Introns | Change | +---------+--------+ Cambridge | 29 | 4 | St Louis | 30 | 2 | +---------+--------+ Synchronisation with GenBank / EMBL: ------------------------------------ No synchronisation issues There are no gaps remaining in the genome sequence --------------- For more info mail worm@sanger.ac.uk -===================================================================================- New Data: --------- OPERON DATA UPDATE: Updated to resolve some of the reported errors and include newly identified operons. CLONE END UPDATE: clone end data has been updated to include details of ~160 clone ends missing in previous releases. NEW GENE MODELS: briggsae-elegans comparison revealed a large number of potential gene predictions not in elegans gene set. The 1st wave of these new entries are contained within the WS107 set. New Fixes: ---------- Known Problems: -------------- Problems with Movie data due to a parsing error, (possible failure to backslash 'http://...'?) Other Changes: -------------- Proposed Changes / Forthcoming Data: ------------------------------------ Update to the Pfam data. Introduction of Pseudogene class. sl2 genes entered and complete. mir gene family completed, names and predictions. Transcript objects representing the full length transcript for genes didn't make it into WS107 but will be added in the next one or two releases. These will be calculated from available EST, OST, and mRNA data. -===================================================================================- Quick installation guide for UNIX/Linux systems ----------------------------------------------- 1. Create a new directory to contain your copy of WormBase, e.g. /users/yourname/wormbase 2. Unpack and untar all of the database.*.tar.gz files into this directory. You will need approximately 2-3 Gb of disk space. 3. Obtain and install a suitable acedb binary for your system (available from www.acedb.org). 4. Use the acedb 'xace' program to open your database, e.g. type 'xace /users/yourname/wormbase' at the command prompt. 5. See the acedb website for more information about acedb and using xace. ____________ END _____________