WS107

From WormBaseWiki
Jump to navigationJump to search

Release Letter

New release of WormBase WS107, Wormpep107 and Wormrna107 Fri Aug  8 13:44:00 BST 2003


WS107 was built by Paul Davis
======================================================================

This directory includes:
i)      database.WS107.*.tar.gz    -   compressed data for new release
ii)     models.wrm.WS107           -   the latest database schema (also in above database files)
iii)    CHROMOSOMES/subdir        -   contains 3 files (DNA, GFF & AGP per chromosome)
iv)     WS107-WS106.dbcomp          -   log file reporting difference from last release
v)      wormpep107.tar.gz          -   full Wormpep distribution corresponding to WS107
vi)     wormrna107.tar.gz          -   latest WormRNA release containing non-coding RNA's in the genome
vii)    confirmed_genes.WS107.gz   -   DNA sequences of all genes confirmed by EST &/or cDNA
viii)   yk2orf.WS107.gz            -    Latest set of ORF connections to each Yuji Kohara EST clone
ix)     gene_interpolated_map_positions.WS107.gz    - Interpolated map positions for each coding/RNA gene
x)      clone_interpolated_map_positions.WS107.gz    - Interpolated map positions for each clone
xi)     best_blastp_hits.WS107.gz    - for each C. elegans WormPep protein, lists Best blastp match to 
                                        human, fly, yeast, C. briggsae, and SwissProt & Trembl proteins.


Release notes on the web:
-------------------------
http://www.sanger.ac.uk/Projects/C_elegans/WORMBASE



Primary databases used in build WS107
------------------------------------
brigdb : 2003-07-25 - updated
camace : 2003-07-28 - updated
citace : 2003-07-27 - updated
cshace : 2003-07-22 - updated
genace : 2003-07-28 - updated
stlace : 2003-07-25 - updated


Genome sequence composition:
----------------------------

        WS107           WS106           change
----------------------------------------------
a       32367166        32367166          +0
c       17780238        17780238          +0
g       17757588        17757588          +0
t       32368415        32368415          +0
n       95              95                +0
-       0               0                 +0

Total   100273502       100273502         +0




Wormpep data set:
----------------------------

There are 19916 CDS in autoace, 22091 when counting 2175 alternate splice forms.

The 22091 sequences contain  base pairs in total.

Modified entries             169
Deleted entries               44
New entries                  409
Reappeared entries            15

Net change  +380




Status of entries: Confidence level of prediction
-------------------------------------------------
Confirmed              4501 (20.4%)
Partially_confirmed    8731 (39.5%)
Predicted              8859 (40.1%)



Status of entries: Protein Accessions
-------------------------------------
Swissprot accessions   2388 (10.8%)
TrEMBL accessions     18305 (82.9%)
TrEMBLnew accessions   1394 (6.3%)



Status of entries: Protein_ID's in EMBL
---------------------------------------
Protein_id            22087 (100.0%)



Locus <-> Sequence connections (cgc-approved)
---------------------------------------------
Entries with locus connection   4500


GeneModel correction progress WS106 -> WS107
-----------------------------------------
Confirmed introns not is a CDS gene model;

                +---------+--------+
                | Introns | Change |
                +---------+--------+
Cambridge       |   1129  |   -58  |
St Louis        |    935  |  -215  |
                +---------+--------+


Members of known repeat families that overlap predicted exons;

                +---------+--------+
                | Introns | Change |
                +---------+--------+
Cambridge       |     29  |     4  |
St Louis        |     30  |     2  |
                +---------+--------+



Synchronisation with GenBank / EMBL:
------------------------------------

No synchronisation issues


There are no gaps remaining in the genome sequence
---------------
For more info mail help@wormbase.org
-===================================================================================-



New Data:
---------

OPERON DATA UPDATE: Updated to resolve some of the reported errors and include newly identified operons.

CLONE END UPDATE: clone end data has been updated to include details of ~160 clone ends missing in previous releases.

NEW GENE MODELS: briggsae-elegans comparison revealed a large number of potential gene 
predictions not in elegans gene set. The 1st wave of these new entries are contained 
within the WS107 set.


New Fixes:
----------


Known Problems:
--------------
Problems with Movie data due to a parsing error, (possible failure to backslash 'http://...'?) 


Other Changes:
--------------

Proposed Changes / Forthcoming Data:
------------------------------------

Update to the Pfam data.

Introduction of Pseudogene class.

sl2 genes entered and complete.

mir gene family completed, names and predictions.

Transcript objects representing the full length transcript for genes didn't make it into WS107 
but will be added in the next one or two releases.  These will be calculated from available EST, 
OST, and mRNA data.


-===================================================================================-


Quick installation guide for UNIX/Linux systems
-----------------------------------------------

1. Create a new directory to contain your copy of WormBase,
        e.g. /users/yourname/wormbase

2. Unpack and untar all of the database.*.tar.gz files into
        this directory. You will need approximately 2-3 Gb of disk space.

3. Obtain and install a suitable acedb binary for your system
        (available from www.acedb.org).

4. Use the acedb 'xace' program to open your database, e.g.
        type 'xace /users/yourname/wormbase' at the command prompt.

5. See the acedb website for more information about acedb and
        using xace.

____________  END _____________