|
|
Line 1: |
Line 1: |
− | New release of WormBase WS196, Wormpep196 and Wormrna196 Fri Oct 31 16:00:42 GMT 2008
| |
| | | |
− |
| |
− | WS196 was built by Mary Ann and Anthony
| |
− | ======================================================================
| |
− |
| |
− | This directory includes:
| |
− | i) database.WS196.*.tar.gz - compressed data for new release
| |
− | ii) models.wrm.WS196 - the latest database schema (also in above database files)
| |
− | iii) CHROMOSOMES/subdir - contains 3 files (DNA, GFF & AGP per chromosome)
| |
− | iv) WS196-WS195.dbcomp - log file reporting difference from last release
| |
− | v) wormpep196.tar.gz - full Wormpep distribution corresponding to WS196
| |
− | vi) wormrna196.tar.gz - latest WormRNA release containing non-coding RNA's in the genome
| |
− | vii) confirmed_genes.WS196.gz - DNA sequences of all genes confirmed by EST &/or cDNA
| |
− | viii) cDNA2orf.WS196.gz - Latest set of ORF connections to each cDNA (EST, OST, mRNA)
| |
− | ix) gene_interpolated_map_positions.WS196.gz - Interpolated map positions for each coding/RNA gene
| |
− | x) clone_interpolated_map_positions.WS196.gz - Interpolated map positions for each clone
| |
− | xi) best_blastp_hits.WS196.gz - for each C. elegans WormPep protein, lists Best blastp match to
| |
− | human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
| |
− | xii) best_blastp_hits_brigprot.WS196.gz - for each C. briggsae protein, lists Best blastp match to
| |
− | human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins.
| |
− | xiii) geneIDs.WS196.gz - list of all current gene identifiers with CGC & molecular names (when known)
| |
− | xiv) PCR_product2gene.WS196.gz - Mappings between PCR products and overlapping Genes
| |
− |
| |
− |
| |
− | Release notes on the web:
| |
− | -------------------------
| |
− | http://www.wormbase.org/wiki/index.php/Release_Schedule
| |
− |
| |
− |
| |
− |
| |
− | Genome sequence composition:
| |
− | ----------------------------
| |
− |
| |
− | WS196 WS195 change
| |
− | ----------------------------------------------
| |
− | a 32365950 32365950 +0
| |
− | c 17779889 17779889 +0
| |
− | g 17756041 17756041 +0
| |
− | t 32365753 32365753 +0
| |
− | n 0 0 +0
| |
− | - 0 0 +0
| |
− |
| |
− | Total 100267633 100267633 +0
| |
− |
| |
− |
| |
− | Chromosomal Changes:
| |
− | --------------------
| |
− | There are no changes to the chromosome sequences in this release.
| |
− |
| |
− |
| |
− | Gene data set (Live C.elegans genes 29872)
| |
− | ------------------------------------------
| |
− | Molecular_info 28155 (94.3%)
| |
− | Concise_description 5447 (18.2%)
| |
− | Reference 13254 (44.4%)
| |
− | WormBase_approved Gene name 15636 (52.3%)
| |
− | RNAi_result 20799 (69.6%)
| |
− | Microarray_results 20438 (68.4%)
| |
− | SAGE_transcript 18838 (63.1%)
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | Wormpep data set:
| |
− | ----------------------------
| |
− |
| |
− | There are 20191 CDS in autoace, 23902 when counting 3711 alternate splice forms.
| |
− |
| |
− | The 23902 sequences contain 10,518,004 base pairs in total.
| |
− |
| |
− | Modified entries 66
| |
− | Deleted entries 25
| |
− | New entries 21
| |
− | Reappeared entries 0
| |
− |
| |
− | Net change -4
| |
− |
| |
− |
| |
− |
| |
− | Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
| |
− | -------------------------------------------------
| |
− | Confirmed 8517 (35.6%) Every base of every exon has transcription evidence (mRNA, EST etc.)
| |
− | Partially_confirmed 10985 (46.0%) Some, but not all exon bases are covered by transcript evidence
| |
− | Predicted 4400 (18.4%) No transcriptional evidence at all
| |
− |
| |
− |
| |
− |
| |
− | Status of entries: Protein Accessions
| |
− | -------------------------------------
| |
− | UniProtKB accessions 23768 (99.4%)
| |
− |
| |
− |
| |
− |
| |
− | Status of entries: Protein_ID's in EMBL
| |
− | ---------------------------------------
| |
− | Protein_id 23772 (99.5%)
| |
− |
| |
− |
| |
− |
| |
− | Gene <-> CDS,Transcript,Pseudogene connections
| |
− | ----------------------------------------------
| |
− | Caenorhabditis elegans entries with WormBase-approved Gene name 13967
| |
− |
| |
− |
| |
− | GeneModel correction progress WS195 -> WS196
| |
− | -----------------------------------------
| |
− | Confirmed introns not in a CDS gene model;
| |
− |
| |
− | +---------+--------+
| |
− | | Introns | Change |
| |
− | +---------+--------+
| |
− | Cambridge | 40 | 13 |
| |
− | St Louis | 118 | 15 |
| |
− | +---------+--------+
| |
− |
| |
− |
| |
− | Members of known repeat families that overlap predicted exons;
| |
− |
| |
− | +---------+--------+
| |
− | | Repeats | Change |
| |
− | +---------+--------+
| |
− | Cambridge | 6 | 0 |
| |
− | St Louis | 6 | 0 |
| |
− | +---------+--------+
| |
− |
| |
− |
| |
− |
| |
− | Synchronisation with GenBank / EMBL:
| |
− | ------------------------------------
| |
− |
| |
− | No synchronisation issues
| |
− |
| |
− |
| |
− | There are no gaps remaining in the genome sequence
| |
− | ---------------
| |
− | For more info mail worm@sanger.ac.uk
| |
− | -===================================================================================-
| |
− |
| |
− |
| |
− |
| |
− | New Data:
| |
− | ---------
| |
− | This release includes the C. brenneri genome for the first time. Orthologies to the other species in WormBase have been determined and Gene names transferred from C. elegans where appropriate.
| |
− |
| |
− | There has been some manual correction of the C. briggsae nGASP gene predictions.
| |
− |
| |
− | Precalculated protein alignments are now made for each main WormBase species.
| |
− | This is available on the Sanger FTP site ftp.sanger.ac.uk/pub2/wormbase/live_release/wormpep_clw.sql.bz2 . For info contact WormBase.
| |
− |
| |
− | Genome sequence updates:
| |
− | -----------------------
| |
− |
| |
− |
| |
− | New Fixes:
| |
− | ----------
| |
− |
| |
− |
| |
− | Known Problems:
| |
− | ---------------
| |
− | Due to the amounts of data and time pressures to get C. brenneri in this release we didn't have time to load all of the blast hits for every species.
| |
− | They are still available in the GFF files and so visible in the genome browser but not via querying the acedb database.
| |
− |
| |
− |
| |
− | Other Changes:
| |
− | --------------
| |
− |
| |
− | Proposed Changes / Forthcoming Data:
| |
− | -------------------------------------
| |
− |
| |
− |
| |
− |
| |
− | Model Changes:
| |
− | ------------------------------------
| |
− | added Person to #Affiliation
| |
− |
| |
− |
| |
− | -===================================================================================-
| |
− |
| |
− |
| |
− | Quick installation guide for UNIX/Linux systems
| |
− | -----------------------------------------------
| |
− |
| |
− | 1. Create a new directory to contain your copy of WormBase,
| |
− | e.g. /users/yourname/wormbase
| |
− |
| |
− | 2. Unpack and untar all of the database.*.tar.gz files into
| |
− | this directory. You will need approximately 2-3 Gb of disk space.
| |
− |
| |
− | 3. Obtain and install a suitable acedb binary for your system
| |
− | (available from www.acedb.org).
| |
− |
| |
− | 4. Use the acedb 'xace' program to open your database, e.g.
| |
− | type 'xace /users/yourname/wormbase' at the command prompt.
| |
− |
| |
− | 5. See the acedb website for more information about acedb and
| |
− | using xace.
| |
− |
| |
− | ____________ END _____________
| |