Difference between revisions of "WS179"
From WormBaseWiki
Jump to navigationJump to search (New page: = Release Notes = <nowiki> New release of WormBase WS179, Wormpep179 and Wormrna179 Mon Jul 30 12:21:01 BST 2007 WS179 was built by [Paul] ========================================...) |
|||
Line 146: | Line 146: | ||
There are no gaps remaining in the genome sequence | There are no gaps remaining in the genome sequence | ||
--------------- | --------------- | ||
− | For more info mail | + | For more info mail help@wormbase.org |
-===================================================================================- | -===================================================================================- | ||
Latest revision as of 10:53, 21 December 2011
Release Notes
New release of WormBase WS179, Wormpep179 and Wormrna179 Mon Jul 30 12:21:01 BST 2007 WS179 was built by [Paul] ====================================================================== This directory includes: i) database.WS179.*.tar.gz - compressed data for new release ii) models.wrm.WS179 - the latest database schema (also in above database files) iii) CHROMOSOMES/subdir - contains 3 files (DNA, GFF & AGP per chromosome) iv) WS179-WS178.dbcomp - log file reporting difference from last release v) wormpep179.tar.gz - full Wormpep distribution corresponding to WS179 vi) wormrna179.tar.gz - latest WormRNA release containing non-coding RNA's in the genome vii) confirmed_genes.WS179.gz - DNA sequences of all genes confirmed by EST &/or cDNA viii) cDNA2orf.WS179.gz - Latest set of ORF connections to each cDNA (EST, OST, mRNA) ix) gene_interpolated_map_positions.WS179.gz - Interpolated map positions for each coding/RNA gene x) clone_interpolated_map_positions.WS179.gz - Interpolated map positions for each clone xi) best_blastp_hits.WS179.gz - for each C. elegans WormPep protein, lists Best blastp match to human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins. xii) best_blastp_hits_brigprot.WS179.gz - for each C. briggsae protein, lists Best blastp match to human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins. xiii) geneIDs.WS179.gz - list of all current gene identifiers with CGC & molecular names (when known) xiv) PCR_product2gene.WS179.gz - Mappings between PCR products and overlapping Genes Release notes on the web: ------------------------- http://www.wormbase.org/wiki/index.php/Release_notes Genome sequence composition: ---------------------------- WS179 WS178 change ---------------------------------------------- a 32365888 32365888 +0 c 17779857 17779856 +1 g 17756016 17756017 -1 t 32365689 32365689 +0 n 0 0 +0 - 0 0 +0 Total 100267450 100267450 +0 Chromosomal Changes: -------------------- Chromosome: IV 16601824 16601823 0 -> 16601824 16601824 1 Chromosome: V 14444487 14444487 1 -> 14444487 14444486 0 Gene data set (Live C.elegans genes 24133) ------------------------------------------ Molecular_info 22426 (92.9%) Concise_description 4803 (19.9%) Reference 7211 (29.9%) CGC_approved Gene name 9332 (38.7%) RNAi_result 19883 (82.4%) Microarray_results 19234 (79.7%) SAGE_transcript 18646 (77.3%) Wormpep data set: ---------------------------- There are 20150 CDS in autoace, 23481 when counting 3331 alternate splice forms. The 23481 sequences contain base pairs in total. Modified entries 28 Deleted entries 0 New entries 35 Reappeared entries 1 Net change +36 Status of entries: Confidence level of prediction (based on the amount of transcript evidence) ------------------------------------------------- Confirmed 7456 (31.8%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 11327 (48.2%) Some, but not all exon bases are covered by transcript evidence Predicted 4698 (20.0%) No transcriptional evidence at all Status of entries: Protein Accessions ------------------------------------- UniProtKB/Swiss-Prot accessions 3557 (15.1%) UniProtKB/TrEMBL accessions 19641 (83.6%) Status of entries: Protein_ID's in EMBL --------------------------------------- Protein_id 23198 (98.8%) Gene <-> CDS,Transcript,Pseudogene connections (cgc-approved) --------------------------------------------- Entries with CGC-approved Gene name 7697 GeneModel correction progress WS178 -> WS179 ----------------------------------------- Confirmed introns not in a CDS gene model; +---------+--------+ | Introns | Change | +---------+--------+ Cambridge | 3543 | 3475 | St Louis | 3556 | 3400 | +---------+--------+ Members of known repeat families that overlap predicted exons; +---------+--------+ | Repeats | Change | +---------+--------+ Cambridge | 6 | 0 | St Louis | 6 | 0 | +---------+--------+ Synchronisation with GenBank / EMBL: ------------------------------------ CHROMOSOME_IV sequence AL132952 CHROMOSOME_V sequence Z72508 CHROMOSOME_V sequence Z77659 CHROMOSOME_X sequence Z67755 There are no gaps remaining in the genome sequence --------------- For more info mail help@wormbase.org -===================================================================================- New Data: --------- 1) There is a new set of GFF files of anomalies which curators may take into account when curating gene structure. The existence of one or more of these anomalies near a gene does not mean that there is necessarily anything wrong with the existing gene structure. SUPPLEMENTARY_GFF/CHROMOSOME_*_curation_anomalies.gff Genome sequence updates: ----------------------- |----------------------------------------------------------------------------| | Clone | Type | Flank | Change | Flank | |--------|-----------|----------------------|--------|-----------------------| | F23B12 | deletion | tggcttccaacgtgg | G | atcttctggagatg | | Y51H4A | insertion | caggaccagctggatcacctg| G | gcgatcaggacaaccaggatc | |----------------------------------------------------------------------------| New Fixes: ---------- 1) WABA has been rerun and standardised for both elegans and briggsae. Previous WABA datasets contained data from different molecule types, clones vs chromosomes or chromosomes vs scaffolds etc. All old cb25 data has been removed as previous releases contained old and new. We Now have: elegans_chromosomes::briggsaeCB3_chromosomes briggsaeCB3_chromosomes::elegans_chromosomes. Known Problems: --------------- Other Changes: -------------- Proposed Changes / Forthcoming Data: ------------------------------------- 1) ~5000 21U RNA genes are going to be in WS180 Model Changes: ------------------------------------ -===================================================================================- Quick installation guide for UNIX/Linux systems ----------------------------------------------- 1. Create a new directory to contain your copy of WormBase, e.g. /users/yourname/wormbase 2. Unpack and untar all of the database.*.tar.gz files into this directory. You will need approximately 2-3 Gb of disk space. 3. Obtain and install a suitable acedb binary for your system (available from www.acedb.org). 4. Use the acedb 'xace' program to open your database, e.g. type 'xace /users/yourname/wormbase' at the command prompt. 5. See the acedb website for more information about acedb and using xace. ____________ END _____________
Known Problems
WABA
Problem
The WABA data involving chrIII of C.briggsae is missing from the ACeDB database as well as the GFF files.
Fix
A tar file with the missing data is available from the Sanger FTP server.
It contains a patchfile for the ACeDB (chrIII_waba_fix.ace) and the GFF (chrIII_waba_fix.gff).