WS219

From WormBaseWiki
Jump to: navigation, search

Release Letter

New release of WormBase WS219, Wormpep219 and Wormrna219 Tue Oct  5 16:04:14 BST 2010


WS219 was built by [Michael Paulini (michael.paulini@wormbase.org)]
-===================================================================================-
The WS219 build directory includes:
genomes DIR              -  contains a sub dir for each WormBase species with sequence, gff, and agp data
        genomes/b_malayi:        - genome_feature_tables/	sequences/
        genomes/c_brenneri:      - genome_feature_tables/	sequences/
        genomes/c_briggsae:      - genome_feature_tables/	sequences/
        genomes/c_elegans:       - annotation/  genome_feature_tables/  sequences/
        genomes/c_japonica:      - genome_feature_tables/	sequences/
        genomes/c_remanei:       - genome_feature_tables/	sequences/
        genomes/h_bacteriophora: - genome_feature_tables/	sequences/
        genomes/h_contortus:     - genome_feature_tables/	sequences/
        genomes/m_hapla:         - genome_feature_tables/	sequences/
        genomes/m_incognita:     - sequences/
        genomes/p_pacificus:     - genome_feature_tables/	sequences/
          *annotation/                    - contains additional annotations
      i) confirmed_genes.WS219.gz  - DNA sequences of all genes confirmed by EST &/or cDNA
     ii) cDNA2orf.WS219.gz         - Latest set of ORF connections to each cDNA (EST, OST, mRNA)
    iii) geneIDs.WS219.gz          - list of all currentnt gene identifiers with CGC & molecular names (when known)
     iv) PCR_product2gene.WS219.gz - Mappings between PCR products and overlapping Genes
      v) oligo_mapping.gz           - V 
          *genome_feature_tables/         - contains the main .gff files and supplementary .gff data
          *sequences/                     - contains dna/      protein/  rna/  sub dirs
            sequences/protein           - WormBase protein set for species + history etc.
     vi) wormpep219.tar.gz         - full Wormpep distribution corresponding to WS219
    vii) wormrna219.tar.gz         - latest WormRNA release containing non-coding RNA's in the genome
   viii) best_blastp_hits_species.WS219.gz  - for each C. elegans WormPep protein, lists Best blastp match to
                        human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
            sequences/dna               - WormBase dna data genomic sequence (raw, soft_masked masked), agp
     ix) intergenic_sequences.dna.gz
            sequences/rna               - WormBase rna gene data.
acedb DIR                -  Everything needed to generate a local copy of the The Primary database
      x) database.WS219.*.tar.gz   - compressed acedb database for new release
     xi) models.wrm.WS219          - the latest database schema (also in above database files)
    xii) WS219-WS218.dbcomp   - log file reporting difference from last release
          *Non_C_elegans_BLASTX/          - This directory contains the blastx data for non-elegans species
                                                    (reduces the size of the main database)
COMPARATIVE_ANALYSIS DIR - compara.tar.bz2 wormpep217_clw.sql.bz2
ONTOLOGY DIR             - gene_associations, obo files for (phenotype GO anatomy) and associated association files


Release notes on the web:
-------------------------
http://www.wormbase.org/wiki/index.php/Release_Schedule




C. elegans Synchronisation with GenBank / EMBL:
------------------------------------

No synchronisation issues


C. elegans Chromosomal Changes:
--------------------
There are no changes to the chromosome sequences in this release.


C. elegans Gene data set (Live C.elegans genes 40117)
------------------------------------------
Molecular_info              38438 (95.8%)
Concise_description         5692 (14.2%)
Reference                   14098 (35.1%)
WormBase_approved_Gene_name 26062 (65%)
RNAi_result                 22875 (57%)
Microarray_results          21042 (52.5%)
SAGE_transcript             19152 (47.7%)


C. elegans 

Wormpep data set:
----------------------------

There are 20408 CDS in autoace, 24826 when counting 4418 alternate splice forms.

The 24826 sequences contain 10,921,490 base pairs in total.

Modified entries      25
Deleted entries       36
New entries           101
Reappeared entries    1

Net change  +66
The differnce between the total CDS's of this (24826) and the last build (24761) does not equal the net change 66
Please investigate! ! 


C. elegans Genome sequence composition:
----------------------------

       	WS219       	WS218      	change
----------------------------------------------
a    	32367418	32367418	  +0
c    	17780787	17780787	  +0
g    	17756985	17756985	  +0
t    	32367086	32367086	  +0
n    	0       	0       	  +0
-    	0       	0       	  +0

Total	100272276	100272276	  +0


Pristionchus pacificus Genome sequence composition:
----------------------------
 169822619 total
 a 41799168
 c 31168435
 g 31196239
 t 41802890
 - 0
 n 23855887


Caenorhabditis remanei Genome sequence composition:
----------------------------
 145500347 total
 a 42927857
 c 26293828
 g 26276020
 t 42923178
 - 0
 n 7079464


Caenorhabditis japonica Genome sequence composition:
----------------------------
 163282347 total
 a 39053092
 c 25603225
 g 25576971
 t 39126103
 - 0
 n 33922956


Caenorhabditis briggsae Genome sequence composition:
----------------------------
 108478630 total
 a 33004189
 c 19675861
 g 19707411
 t 33049803
 - 59500
 n 2981866


Caenorhabditis brenneri Genome sequence composition:
----------------------------
 190426650 total
 a 52223359
 c 32838518
 g 32883836
 t 52164943
 - 0
 n 20315994




Tier II Gene counts
---------------------------------------------
pristionchus Gene count 29638 (Coding 29639)
remanei Gene count 32431 (Coding 31476)
heterorhabditis Gene count 0 (Coding 0)
japonica Gene count 27177 (Coding 25870)
briggsae Gene count 23043 (Coding 21995)
brenneri Gene count 32295 (Coding 30707)
---------------------------------------------




-------------------------------------------------
Pristionchus pacificus Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed               425 (1.4%)	Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed    5309 (17.9%)	Some, but not all exon bases are covered by transcript evidence
Predicted             23905 (80.7%)	No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Pristionchus pacificus entries with WormBase-approved Gene name   2792




-------------------------------------------------
Caenorhabditis remanei Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed               955 (3.0%)	Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed    5662 (18.0%)	Some, but not all exon bases are covered by transcript evidence
Predicted             24859 (79.0%)	No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis remanei entries with WormBase-approved Gene name   5461




-------------------------------------------------
Caenorhabditis japonica Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed              1182 (4.6%)	Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed    4974 (19.2%)	Some, but not all exon bases are covered by transcript evidence
Predicted             19714 (76.2%)	No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis japonica entries with WormBase-approved Gene name   4791




-------------------------------------------------
Caenorhabditis briggsae Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed                52 (0.2%)	Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed     856 (3.9%)	Some, but not all exon bases are covered by transcript evidence
Predicted             21087 (95.9%)	No transcriptional evidence at all



Status of entries: Protein Accessions
-------------------------------------
UniProtKB accessions  21708 (98.7%)

Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis briggsae entries with WormBase-approved Gene name   5497




-------------------------------------------------
Caenorhabditis brenneri Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed              1517 (4.9%)	Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed    5638 (18.4%)	Some, but not all exon bases are covered by transcript evidence
Predicted             23550 (76.7%)	No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis brenneri entries with WormBase-approved Gene name   3091




-------------------------------------------------
Caenorhabditis elegans Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed             11654 (46.9%)	Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed   10997 (44.3%)	Some, but not all exon bases are covered by transcript evidence
Predicted              2175 (8.8%)	No transcriptional evidence at all



Status of entries: Protein Accessions
-------------------------------------
UniProtKB accessions  24631 (99.2%)

Status of entries: Protein_ID's in EMBL
---------------------------------------
Protein_id            24631 (99.2%)

Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis elegans entries with WormBase-approved Gene name  24430



C. elegans Operons Stats
---------------------------------------------
Description: These exist as closely spaced gene clusters similar to bacterial operons
---------------------------------------------
| Live Operons        1288                |
| Genes in Operons    3343                |



GO Annotation Stats WS219
--------------------------------------

GO_codes - used for assigning evidence
--------------------------------------
IC  Inferred by Curator
IDA Inferred from Direct Assay
IEA Inferred from Electronic Annotation
IEP Inferred from Expression Pattern
IGI Inferred from Genetic Interaction
IMP Inferred from Mutant Phenotype
IPI Inferred from Physical Interaction
ISS Inferred from Sequence (or Structural) Similarity
NAS Non-traceable Author Statement
NDNo Biological Data available
RCA ?
TAS Traceable Author Statement
------------------------------------------------

Total number of Gene::GO connections:  262467

Genes Stats:
----------------
Genes with GO_term connections         88452  
           IEA GO_code present         83023  
       non-IEA GO_code present         5425  

Source of the mapping data             
Source: *RNAi (GFF mapping overlaps)   23372  
        *citace                        2133  
        *Inherited (motif & phenotype) 14290  

GO_terms Stats:
---------------
Total No. GO_terms                     30467  
GO_terms connected to Genes            3241  
GO annotations connected with IEA      1838  
GO annotations connected with non-IEA  1399  
   Breakdown  IC - 2   IDA - 348   ISS - 126 
             IEP - 9   IGI - 112   IMP - 711 
             IPI - 68  NAS - 1     ND  - 1  
             RCA - 0   TAS - 21   


-===================================================================================-

Useful Stats:
---------

Genes with Sequence and CGC name
WS219 46062 (24430 elegans / 5497 briggsae / 5461 remanei / 4791 japonica / 3091 brenneri / 2792 pristionchus)


-===================================================================================-



New Data:
---------
There is a new file produced on the Sanger FTP site.

requested by : Megan Senchuk, msenchuk@mcb.harvard.edu

function: a 'nice' view of the phenotypes produced by genes knocked down by RNAi.

script : ONTOLOGY/get_easy_phenotypes.pl

output1 : ftp://ftp.sanger.ac.uk/pub/wormbase/WSXXX/ONTOLOGY/rnai_phenotypes.WSXXX.wb

example lines :

WBGene00001908 F17E9.9 larval arrest WBPhenotype:0000059 WBRNAi00025129|WBPaper00006395

WBGene00001908 F17E9.9 locomotion variant WBPhenotype:0000643 WBRNAi00025129|WBPaper00006395

output2 : ftp://ftp.sanger.ac.uk/pub/wormbase/WSXXX/ONTOLOGY/rnai_phenotypes_quick.WXXX.wb

example lines :

WBGene00001908 F17E9.9 larval arrest, locomotion variant, embryonic lethal, maternal sterile

WBGene00019433 K06A5.6 embryonic lethal


Genome sequence updates:
-----------------------
The C.brenneri assembly had contaminations removed and was updated to the latest version
available in GenBank/EMBLBank/DDBJ.

New Fixes:
----------
C.brenneri genes on the removed sequences were killed.

Known Problems:
---------------


Other Changes:
--------------

Proposed Changes / Forthcoming Data:
-------------------------------------
?Strain class
Removed:Extended_genotype ?Variation

Comment: Removed this unused tag structure as data can be stored in current
         tags and the additional information/context can be resolved at the
         display level.

Model Changes:
------------------------------------
?Molecule - fix ?Phenotype tags missed from original submission + additions (Karen Y.)

?Strain - Extended_genotype - signed off on the call, but had errors I discovered on testing.
          Dropped the XREF back to Variation

?Variation removal of obsolete/unused tags (Mary Ann T., Jolene F.)

?Phenotype - Migration of data from "Not" tags continues need additional tags for storing phenotypes scored but not observed (Wen C.)


For more info mail help@wormbase.org
-===================================================================================-


Quick installation guide for UNIX/Linux systems
-----------------------------------------------

1. Create a new directory to contain your copy of WormBase,
	e.g. /users/yourname/wormbase

2. Unpack and untar all of the database.*.tar.gz files into
	this directory. You will need approximately 2-3 Gb of disk space.

3. Obtain and install a suitable acedb binary for your system
	(available from www.acedb.org).

4. Use the acedb 'xace' program to open your database, e.g.
	type 'xace /users/yourname/wormbase' at the command prompt.

5. See the acedb website for more information about acedb and
	using xace.

____________  END _____________