WS217
From WormBaseWiki
Jump to navigationJump to search
New release of WormBase WS217, Wormpep217 and Wormrna217 Wed Jul 28 09:25:17 BST 2010 WS217 was built by Gary Williams -===================================================================================- This directory includes: i) database.WS217.*.tar.gz - compressed data for new release ii) models.wrm.WS217 - the latest database schema (also in above database files) iii) CHROMOSOMES/subdir - contains 3 files (DNA, GFF & AGP per chromosome) iv) WS217-WS216.dbcomp - log file reporting difference from last release v) wormpep217.tar.gz - full Wormpep distribution corresponding to WS217 vi) wormrna217.tar.gz - latest WormRNA release containing non-coding RNA's in the genome vii) confirmed_genes.WS217.gz - DNA sequences of all genes confirmed by EST &/or cDNA viii) cDNA2orf.WS217.gz - Latest set of ORF connections to each cDNA (EST, OST, mRNA) ix) gene_interpolated_map_positions.WS217.gz - Interpolated map positions for each coding/RNA gene x) clone_interpolated_map_positions.WS217.gz - Interpolated map positions for each clone xi) best_blastp_hits.WS217.gz - for each C. elegans WormPep protein, lists Best blastp match to human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins. xii) best_blastp_hits_brigprot.WS217.gz - for each C. briggsae protein, lists Best blastp match to human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins. xiii) geneIDs.WS217.gz - list of all current gene identifiers with CGC & molecular names (when known) xiv) PCR_product2gene.WS217.gz - Mappings between PCR products and overlapping Genes Release notes on the web: ------------------------- http://www.wormbase.org/wiki/index.php/Release_Schedule Synchronisation with GenBank / EMBL: ------------------------------------ CHROMOSOME_I sequence AF067219 CHROMOSOME_II sequence U29244 Genome sequence composition: ---------------------------- WS217 WS216 change ---------------------------------------------- a 32367418 32367418 +0 c 17780787 17780787 +0 g 17756985 17756985 +0 t 32367086 32367086 +0 n 0 0 +0 - 0 0 +0 Total 100272276 100272276 +0 Chromosomal Changes: -------------------- There are no changes to the chromosome sequences in this release. Gene data set (Live C.elegans genes 40183) ------------------------------------------ Molecular_info 38499 (95.8%) Concise_description 5669 (14.1%) Reference 14039 (34.9%) WormBase_approved Gene name 25996 (64.7%) RNAi_result 22928 (57.1%) Microarray_results 21075 (52.4%) SAGE_transcript 19147 (47.6%) Wormpep data set: ---------------------------- There are 20387 CDS in autoace, 24705 when counting 4318 alternate splice forms. The 24705 sequences contain 10,879,267 base pairs in total. Modified entries 47 Deleted entries 14 New entries 67 Reappeared entries 2 Net change +55 Status of entries: Confidence level of prediction (based on the amount of transcript evidence) ------------------------------------------------- Confirmed 10162 (41.1%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 11800 (47.8%) Some, but not all exon bases are covered by transcript evidence Predicted 2743 (11.1%) No transcriptional evidence at all Status of entries: Protein Accessions ------------------------------------- UniProtKB accessions 24505 (99.2%) Status of entries: Protein_ID's in EMBL --------------------------------------- Protein_id 24505 (99.2%) Gene <-> CDS,Transcript,Pseudogene connections ---------------------------------------------- Caenorhabditis elegans entries with WormBase-approved Gene name 24360 C. elegans Operons Stats --------------------------------------------- Description: These exist as closely spaced gene clusters similar to bacterial operons --------------------------------------------- | Live Operons 1267 | | Genes in Operons 3268 | GO Annotation Stats WS217 -------------------------------------- GO_codes - used for assigning evidence -------------------------------------- IC Inferred by Curator IDA Inferred from Direct Assay IEA Inferred from Electronic Annotation IEP Inferred from Expression Pattern IGI Inferred from Genetic Interaction IMP Inferred from Mutant Phenotype IPI Inferred from Physical Interaction ISS Inferred from Sequence (or Structural) Similarity NAS Non-traceable Author Statement NDNo Biological Data available RCA ? TAS Traceable Author Statement ------------------------------------------------ Total number of Gene::GO connections: 262140 Genes Stats: ---------------- Genes with GO_term connections 88755 IEA GO_code present 82558 non-IEA GO_code present 6194 Source of the mapping data Source: *RNAi (GFF mapping overlaps) 23034 *citace 2046 *Inherited (motif & phenotype) 15016 GO_terms Stats: --------------- Total No. GO_terms 30460 GO_terms connected to Genes 3195 GO annotations connected with IEA 1833 GO annotations connected with non-IEA 1359 Breakdown IC - 2 IDA - 331 ISS - 123 IEP - 9 IGI - 114 IMP - 694 IPI - 63 NAS - 1 ND - 1 RCA - 0 TAS - 21 ------------------------------------------------ Tier II Gene counts --------------------------------------------- pristionchus Gene count 29638 (Coding 29639) remanei Gene count 32431 (Coding 31476) heterorhabditis Gene count 0 (Coding 0) japonica Gene count 27177 (Coding 25870) briggsae Gene count 23044 (Coding 21997) brenneri Gene count 32288 (Coding 30663) --------------------------------------------- -===================================================================================- New Data: --------- WGS Data Data from two WGS projects has been submitted to WormBase: 57 alleles from Sarin et al PMID 20439776 (all of biological interest) . 2723 alleles from Flibotte et al PMID 20439774 (1633 of biological interest). We are establishing a pipeline for the submission of the large quantities of WGS data we are expecting so this data is the initial representation which may change in time. Orthology and OMIM Orthology predictions to 50 eukaryotes were included based on EnsEMBL release 58 and the frozen WormBase release WS210. In addition information on inherited diseases in human was updated from OMIM. Based on user feedback, the WormBase GFF2 files include the public name of variations and RNAi experiments. In addition non-coding RNAs show now the the same information as the coding equivalents. Blast databases updated ipi_human flybase yeast Genome sequence updates: ----------------------- None New Fixes: ---------- None Known Problems: --------------- None Other Changes: -------------- None Proposed Changes / Forthcoming Data: ------------------------------------- Model Changes: ------------------------------------ Added Brief_id to ?Position_Matrix for Xiaodong and ?Molecule class for Karen For more info mail worm@sanger.ac.uk -===================================================================================- Quick installation guide for UNIX/Linux systems ----------------------------------------------- 1. Create a new directory to contain your copy of WormBase, e.g. /users/yourname/wormbase 2. Unpack and untar all of the database.*.tar.gz files into this directory. You will need approximately 2-3 Gb of disk space. 3. Obtain and install a suitable acedb binary for your system (available from www.acedb.org). 4. Use the acedb 'xace' program to open your database, e.g. type 'xace /users/yourname/wormbase' at the command prompt. 5. See the acedb website for more information about acedb and using xace. ____________ END _____________