WS224

From WormBaseWiki
Revision as of 16:15, 4 March 2011 by Matuli (talk | contribs) (Created page with '<pre> New release of WormBase WS224, Wormpep224 and Wormrna224 Fri Mar 4 14:58:13 GMT 2011 WS224 was built by Mary Ann Tuli -==================================================…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
New release of WormBase WS224, Wormpep224 and Wormrna224 Fri Mar  4 14:58:13 GMT 2011


WS224 was built by Mary Ann Tuli
-===================================================================================-
The WS224 build directory includes:
genomes DIR              -  contains a sub dir for each WormBase species with sequence, gff, and agp data
        genomes/b_malayi:        - genome_feature_tables/	sequences/
        genomes/c_brenneri:      - genome_feature_tables/	sequences/
        genomes/c_briggsae:      - genome_feature_tables/	sequences/
        genomes/c_elegans:       - annotation/  genome_feature_tables/  sequences/
        genomes/c_japonica:      - genome_feature_tables/	sequences/
        genomes/c_remanei:       - genome_feature_tables/	sequences/
        genomes/h_bacteriophora: - genome_feature_tables/	sequences/
        genomes/h_contortus:     - genome_feature_tables/	sequences/
        genomes/m_hapla:         - genome_feature_tables/	sequences/
        genomes/m_incognita:     - sequences/
        genomes/p_pacificus:     - genome_feature_tables/	sequences/
          *annotation/                    - contains additional annotations
      i) confirmed_genes.WS224.gz  - DNA sequences of all genes confirmed by EST &/or cDNA
     ii) cDNA2orf.WS224.gz         - Latest set of ORF connections to each cDNA (EST, OST, mRNA)
    iii) geneIDs.WS224.gz          - list of all current gene identifiers with CGC & molecular names (when known)
     iv) PCR_product2gene.WS224.gz - Mappings between PCR products and overlapping Genes
      v) oligo_mapping.gz           - V 
          *genome_feature_tables/         - contains the main .gff files and supplementary .gff data
          *sequences/                     - contains dna/      protein/  rna/  sub dirs
            sequences/protein           - WormBase protein set for species + history etc.
     vi) wormpep224.tar.gz         - full Wormpep distribution corresponding to WS224
    vii) wormrna224.tar.gz         - latest WormRNA release containing non-coding RNA's in the genome
   viii) best_blastp_hits_species.WS224.gz  - for each C. elegans WormPep protein, lists Best blastp match to
                        human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
            sequences/dna               - WormBase dna data genomic sequence (raw, soft_masked masked), agp
     ix) intergenic_sequences.dna.gz
            sequences/rna               - WormBase rna gene data.
acedb DIR                -  Everything needed to generate a local copy of the The Primary database
      x) database.WS224.*.tar.gz   - compressed acedb database for new release
     xi) models.wrm.WS224          - the latest database schema (also in above database files)
    xii) WS224-WS223.dbcomp   - log file reporting difference from last release
          *Non_C_elegans_BLASTX/          - This directory contains the blastx data for non-elegans species
                                                    (reduces the size of the main database)
COMPARATIVE_ANALYSIS DIR - compara.tar.bz2 wormpep217_clw.sql.bz2
ONTOLOGY DIR             - gene_associations, obo files for (phenotype GO anatomy) and associated association files


Release notes on the web:

http://www.wormbase.org/wiki/index.php/Release_Schedule C. elegans Synchronisation with GenBank / EMBL:
No synchronisation issues C. elegans Chromosomal Changes:
There are no changes to the chromosome sequences in this release. C. elegans Gene data set (Live C.elegans genes 47387)
Molecular_info 45719 (96.5%) Concise_description 5736 (12.1%) Reference 14138 (29.8%) WormBase_approved_Gene_name 26116 (55.1%) RNAi_result 24628 (52%) Microarray_results 22118 (46.7%) SAGE_transcript 19125 (40.4%) C. elegans Wormpep data set:
There are 20430 CDS in autoace, 25010 when counting 4580 alternate splice forms. The 25010 sequences contain 10,980,936 base pairs in total. Modified entries 48 Deleted entries 46 New entries 97 Reappeared entries 2 Net change +53 The differnce between the total CDS's of this (25010) and the last build (24959) does not equal the net change 53 Please investigate! ! C. elegans Genome sequence composition:
WS224 WS223 change
a 32367418 32367418 +0 c 17780787 17780787 +0 g 17756985 17756985 +0 t 32367086 32367086 +0 n 0 0 +0 - 0 0 +0 Total 100272276 100272276 +0 Pristionchus pacificus Genome sequence composition:
172773083 total a 43813958 c 32811034 g 32828589 t 43810996 - 0 n 19508506 Caenorhabditis remanei Genome sequence composition:
145500347 total a 42927857 c 26293828 g 26276020 t 42923178 - 0 n 7079464 Caenorhabditis japonica Genome sequence composition:
163282347 total a 39053092 c 25603225 g 25576971 t 39126103 - 0 n 33922956 Caenorhabditis briggsae Genome sequence composition:
108419768 total a 32984239 c 19684682 g 19693545 t 33054090 - 0 n 3003212 Caenorhabditis brenneri Genome sequence composition:
190487923 total a 52239259 c 32853644 g 32897666 t 52181360 - 0 n 20315994 Tier II Gene counts
pristionchus Gene count 24216 (Coding 24216) remanei Gene count 32431 (Coding 31471) japonica Gene count 27177 (Coding 25870) briggsae Gene count 23052 (Coding 21963) brenneri Gene count 32260 (Coding 30670)

Pristionchus pacificus Protein Stats:
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
Confirmed 229 (0.9%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 4982 (20.6%) Some, but not all exon bases are covered by transcript evidence Predicted 19006 (78.5%) No transcriptional evidence at all Gene <-> CDS,Transcript,Pseudogene connections
Pristionchus pacificus entries with WormBase-approved Gene name 3078
Caenorhabditis remanei Protein Stats:
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
Confirmed 956 (3.0%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 5662 (18.0%) Some, but not all exon bases are covered by transcript evidence Predicted 24858 (79.0%) No transcriptional evidence at all Gene <-> CDS,Transcript,Pseudogene connections
Caenorhabditis remanei entries with WormBase-approved Gene name 5499
Caenorhabditis japonica Protein Stats:
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
Confirmed 1182 (4.6%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 4974 (19.2%) Some, but not all exon bases are covered by transcript evidence Predicted 19714 (76.2%) No transcriptional evidence at all Gene <-> CDS,Transcript,Pseudogene connections
Caenorhabditis japonica entries with WormBase-approved Gene name 4834
Caenorhabditis briggsae Protein Stats:
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
Confirmed 53 (0.2%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 854 (3.9%) Some, but not all exon bases are covered by transcript evidence Predicted 21080 (95.9%) No transcriptional evidence at all Status of entries: Protein Accessions
UniProtKB accessions 21683 (98.6%) Gene <-> CDS,Transcript,Pseudogene connections
Caenorhabditis briggsae entries with WormBase-approved Gene name 5544
Caenorhabditis brenneri Protein Stats:
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
Confirmed 1512 (4.9%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 5635 (18.4%) Some, but not all exon bases are covered by transcript evidence Predicted 23526 (76.7%) No transcriptional evidence at all Gene <-> CDS,Transcript,Pseudogene connections
Caenorhabditis brenneri entries with WormBase-approved Gene name 3123
Caenorhabditis elegans Protein Stats:
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
Confirmed 11801 (47.2%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 11033 (44.1%) Some, but not all exon bases are covered by transcript evidence Predicted 2176 (8.7%) No transcriptional evidence at all Status of entries: Protein Accessions
UniProtKB accessions 24707 (98.8%) Status of entries: Protein_ID's in EMBL
Protein_id 24822 (99.2%) Gene <-> CDS,Transcript,Pseudogene connections
Caenorhabditis elegans entries with WormBase-approved Gene name 24493 C. elegans Operons Stats
Description: These exist as closely spaced gene clusters similar to bacterial operons
| Live Operons 1288 | | Genes in Operons 3341 |
GO Annotation Stats WS224
GO_codes - used for assigning evidence
IC Inferred by Curator IDA Inferred from Direct Assay IEA Inferred from Electronic Annotation IEP Inferred from Expression Pattern IGI Inferred from Genetic Interaction IMP Inferred from Mutant Phenotype IPI Inferred from Physical Interaction ISS Inferred from Sequence (or Structural) Similarity NAS Non-traceable Author Statement ND No Biological Data available RCA Inferred from Reviewed Computational Analysis TAS Traceable Author Statement
Total number of Gene::GO connections: 252001 Genes Stats:
Genes with GO_term connections 86456 IEA GO_code present 80572 non-IEA GO_code present 5880 Source of the mapping data Source: *RNAi (GFF mapping overlaps) 24507 *citace 2267 *Inherited (motif & phenotype) 15090 GO_terms Stats:
Total No. GO_terms 30483 GO_terms connected to Genes 3336 GO annotations connected with IEA 1856 GO annotations connected with non-IEA 1474 Breakdown IC - 3 IDA - 375 ISS - 136 IEP - 9 IGI - 120 IMP - 736 IPI - 72 NAS - 1 ND - 1 RCA - 0 TAS - 20 -===================================================================================- Useful Stats:
Genes with Sequence and CGC name WS224 46571 (24493 elegans / 5544 briggsae / 5499 remanei / 4834 japonica / 3123 brenneri / 3078 pristionchus) -===================================================================================- New Data:
Genome sequence updates:
Caenorhabditis briggsae This build includes a new genome assembly (cb4) for C. briggsae (Haag Laboratory, Department of Biology, University of Maryland). Briefly, 167 AF16/HK104 advanced-intercross recombinant inbred lines were successfully genotyped at 1,032 single nucleotide polymorphism (SNP) markers. The resulting data were used to estimate high-density genetic maps. Sequences were assembled by combining the physical (cb25 assembly; [Stein et al. 2003 PMID:14624247]) and genetic positions of the SNPs to inform the process of ordering and orienting the cb25 supercontigs into chromosome assemblies. Detailed methods will be published subsequently [Ross JA et al, 2011 PLOS Genetics, in review]. New Fixes:
Known Problems:
1. An ACeDB patch for containing the missing ortology and paralogy data can be found at: ftp://ftp.sanger.ac.uk/pub2/wormbase/WS224/acedb/patch/compara.ace.bz2 2. Due to technical reasons Heterorhabditis bacteriophora ESTs were excluded in the BLAT alignments of C.briggsae and C.elegans. We intend to add them back into the next frozen release WS225. Other Changes:
Proposed Changes / Forthcoming Data:
Trichinella WS225 will include the draft Trichinella spiralis genome of Mitreva, et.al Connections to orthologeaous Trichinella spiralis proteins will be included for all genes, as well as an annotated Genome Browser. Model Changes:
This cycle sees the introduction of the WBProcess class model. For more info mail worm@sanger.ac.uk -===================================================================================- Quick installation guide for UNIX/Linux systems
1. Create a new directory to contain your copy of WormBase, e.g. /users/yourname/wormbase 2. Unpack and untar all of the database.*.tar.gz files into this directory. You will need approximately 2-3 Gb of disk space. 3. Obtain and install a suitable acedb binary for your system (available from www.acedb.org). 4. Use the acedb 'xace' program to open your database, e.g. type 'xace /users/yourname/wormbase' at the command prompt. 5. See the acedb website for more information about acedb and using xace. ____________ END _____________