WS215

From WormBaseWiki
Revision as of 09:02, 24 June 2010 by Pdavis (talk | contribs)
Jump to navigationJump to search

Release Letter

New release of WormBase WS215, Wormpep215 and Wormrna215 Thu May 27 18:15:29 BST 2010


WS215 was built by Paul Davis
======================================================================

This directory includes:
i)   database.WS215.*.tar.gz    -   compressed data for new release
ii)  models.wrm.WS215           -   the latest database schema (also in above database files)
iii) CHROMOSOMES/subdir         -   contains 3 files (DNA, GFF & AGP per chromosome)
iv)  WS215-WS214.dbcomp         -   log file reporting difference from last release
v)   wormpep215.tar.gz          -   full Wormpep distribution corresponding to WS215
vi)   wormrna215.tar.gz          -   latest WormRNA release containing non-coding RNA's in the genome
vii)  confirmed_genes.WS215.gz   -   DNA sequences of all genes confirmed by EST &/or cDNA
viii) cDNA2orf.WS215.gz           -   Latest set of ORF connections to each cDNA (EST, OST, mRNA)
ix)   gene_interpolated_map_positions.WS215.gz    - Interpolated map positions for each coding/RNA gene
x)    clone_interpolated_map_positions.WS215.gz   - Interpolated map positions for each clone
xi)   best_blastp_hits.WS215.gz  - for each C. elegans WormPep protein, lists Best blastp match to
                            human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
xii)  best_blastp_hits_brigprot.WS215.gz   - for each C. briggsae protein, lists Best blastp match to
                                     human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins.
xiii) geneIDs.WS215.gz   - list of all current gene identifiers with CGC & molecular names (when known)
xiv)  PCR_product2gene.WS215.gz   - Mappings between PCR products and overlapping Genes


Release notes on the web:
-------------------------
http://www.wormbase.org/wiki/index.php/Release_Schedule



Genome sequence composition:
----------------------------

        WS215           WS214           change
----------------------------------------------
a       32367418        32367418          +0
c       17780787        17780763         +24
g       17756985        17756943         +42
t       32367086        32367086          +0
n       0               0                 +0
-       0               0                 +0

Total   100272276       100272210        +66


Chromosomal Changes:
--------------------

Chromosome: I
221371 221370 0   ->   221371 221371 1
232041 232041 1   ->   232042 232041 0
742850 742849 0   ->   742850 742850 1
4184639 4184639 1   ->   4184640 4184639 0
7537823 7537822 0   ->   7537823 7537823 1
10436460 10436460 1   ->   10436461 10436460 0
11522478 11522477 0   ->   11522478 11522478 1
14624752 14624751 0   ->   14624753 14624753 1

Chromosome: II
213640 213639 0   ->   213640 213640 1
1577787 1577786 0   ->   1577788 1577788 1
1602997 1602996 0   ->   1602999 1602999 1
1926841 1926840 0   ->   1926844 1926844 1
2678739 2678738 0   ->   2678743 2678743 1
2796454 2796453 0   ->   2796459 2796459 1
2879047 2879046 0   ->   2879053 2879053 1
2925948 2925947 0   ->   2925955 2925955 1
3521238 3521237 0   ->   3521246 3521246 1
4754160 4754160 1   ->   4754169 4754168 0
4763297 4763297 1   ->   4763305 4763304 0
4831918 4831917 0   ->   4831925 4831925 1
5028913 5028913 1   ->   5028921 5028920 0
5079266 5079265 0   ->   5079273 5079273 1
5203317 5203316 0   ->   5203325 5203325 1
5271461 5271460 0   ->   5271470 5271470 1
5661841 5661841 1   ->   5661851 5661850 0
5667684 5667683 0   ->   5667693 5667693 1
5670255 5670254 0   ->   5670265 5670265 1
5707678 5707677 0   ->   5707689 5707689 1
5820841 5820841 1   ->   5820853 5820852 0
5938982 5938981 0   ->   5938993 5938993 1
5974548 5974547 0   ->   5974560 5974560 1
5998651 5998650 0   ->   5998664 5998664 1
6193651 6193650 0   ->   6193665 6193665 1
6200179 6200178 0   ->   6200194 6200194 1
6256081 6256080 0   ->   6256097 6256097 1
6500693 6500692 0   ->   6500710 6500710 1
6539463 6539462 0   ->   6539481 6539481 1
6700729 6700728 0   ->   6700748 6700748 1
6749218 6749218 1   ->   6749238 6749237 0
7479115 7479114 0   ->   7479134 7479134 1
7547483 7547482 0   ->   7547503 7547503 1
7642038 7642058 21   ->   7642059 7642081 23
7739394 7739394 1   ->   7739417 7739416 0
7743764 7743866 103   ->   7743786 7743885 100
7745770 7745770 1   ->   7745789 7745788 0
7866906 7866905 0   ->   7866924 7866924 1
7871213 7871213 1   ->   7871232 7871231 0
9234924 9234924 1   ->   9234942 9234941 0
9437403 9437402 0   ->   9437420 9437420 1
11207622 11207621 0   ->   11207640 11207640 1
11384800 11384799 0   ->   11384819 11384819 1
11717738 11717737 0   ->   11717758 11717758 1
12556602 12556601 0   ->   12556623 12556623 1

Chromosome: III
661018 661017 0   ->   661018 661018 1
835907 835906 0   ->   835908 835908 1
3366749 3366748 0   ->   3366751 3366751 1
4059722 4059722 1   ->   4059725 4059724 0
5105426 5105425 0   ->   5105428 5105428 1
5333261 5333260 0   ->   5333264 5333264 1
5443182 5443181 0   ->   5443186 5443186 1
5467693 5467693 1   ->   5467698 5467697 0
5531120 5531119 0   ->   5531124 5531124 1
6033358 6033358 1   ->   6033363 6033362 0
6193202 6193201 0   ->   6193206 6193206 1
6322256 6322256 1   ->   6322261 6322260 0
6412493 6412492 0   ->   6412497 6412497 1
6468411 6468410 0   ->   6468416 6468416 1
6757605 6757604 0   ->   6757611 6757611 1
6888304 6888303 0   ->   6888311 6888311 1
7357296 7357295 0   ->   7357304 7357304 1
7587398 7587558 161   ->   7587407 7587565 159
7587845 7587845 1   ->   7587852 7587851 0
7882072 7882072 1   ->   7882078 7882077 0
8042710 8042709 0   ->   8042715 8042715 1
8068404 8068403 0   ->   8068410 8068410 1
8130860 8130859 0   ->   8130867 8130867 1
8472517 8472516 0   ->   8472525 8472525 1
8552918 8552917 0   ->   8552927 8552927 1
8583371 8583370 0   ->   8583381 8583381 1
8888186 8888185 0   ->   8888197 8888197 1
8976444 8976444 1   ->   8976456 8976455 0
9199647 9199646 0   ->   9199658 9199658 1
9350790 9350789 0   ->   9350802 9350802 1
10460403 10460402 0   ->   10460416 10460416 1
10483640 10483640 1   ->   10483654 10483653 0
10509770 10509769 0   ->   10509783 10509783 1
10551516 10551516 1   ->   10551530 10551529 0
10552638 10552637 0   ->   10552651 10552651 1
10553066 10553065 0   ->   10553080 10553080 1

Chromosome: IV
1515191 1515190 0   ->   1515191 1515191 1
3382096 3382096 1   ->   3382097 3382096 0
4583606 4583605 0   ->   4583606 4583606 1
4801204 4801203 0   ->   4801205 4801205 1
5208046 5208045 0   ->   5208048 5208048 1
5405347 5405346 0   ->   5405350 5405350 1
6006168 6006167 0   ->   6006172 6006172 1
9195668 9195667 0   ->   9195673 9195673 1
9840852 9840851 0   ->   9840858 9840858 1
10414831 10414830 0   ->   10414838 10414838 1
10894242 10894241 0   ->   10894250 10894250 1
11050311 11050311 1   ->   11050320 11050319 0
12091259 12091258 0   ->   12091267 12091267 1
12433585 12433585 1   ->   12433594 12433593 0
13188309 13188308 0   ->   13188317 13188317 1
13791001 13791001 1   ->   13791010 13791009 0
15965559 15965558 0   ->   15965567 15965567 1

Chromosome: V
653489 653489 1   ->   653489 653488 0
3556390 3556389 0   ->   3556389 3556389 1
8233595 8233594 0   ->   8233595 8233595 1
8757755 8757754 0   ->   8757756 8757756 1
9113932 9113931 0   ->   9113934 9113934 1
9669525 9669524 0   ->   9669528 9669528 1
11495161 11495161 1   ->   11495165 11495164 0
12810476 12810475 0   ->   12810479 12810479 1
13506615 13506614 0   ->   13506619 13506619 1
13665874 13665873 0   ->   13665879 13665879 1

Chromosome: X
425164 425164 1   ->   425164 425163 0
2234457 2234457 1   ->   2234456 2234455 0
3413858 3413857 0   ->   3413856 3413856 1
3812192 3812191 0   ->   3812191 3812191 1
4565293 4565292 0   ->   4565293 4565293 1
4603455 4603455 1   ->   4603456 4603455 0
5170173 5170172 0   ->   5170173 5170173 1
5462747 5462746 0   ->   5462748 5462748 1
5526944 5526943 0   ->   5526946 5526946 1
5590456 5590455 0   ->   5590459 5590459 1
5599499 5599498 0   ->   5599503 5599503 1
6059577 6059577 1   ->   6059582 6059581 0
6075483 6075482 0   ->   6075487 6075487 1
6283378 6283377 0   ->   6283383 6283383 1
7139446 7139446 1   ->   7139452 7139451 0
7145340 7145339 0   ->   7145345 7145345 1
7436795 7436794 0   ->   7436801 7436801 1
7986172 7986171 0   ->   7986179 7986179 1
8325359 8325359 1   ->   8325367 8325366 0
8596018 8596062 45   ->   8596025 8596069 45
8601957 8601956 0   ->   8601964 8601964 1
8639860 8639860 1   ->   8639868 8639867 0
9235232 9235231 0   ->   9235239 9235239 1
11423706 11423705 0   ->   11423714 11423714 1
14574517 14574517 1   ->   14574526 14574525 0
14621700 14621699 0   ->   14621708 14621708 1
14634338 14634337 0   ->   14634347 14634347 1
17228979 17228978 0   ->   17228989 17228989 1
17714411 17714410 0   ->   17714422 17714422 1


Gene data set (Live C.elegans genes 40156)
------------------------------------------
Molecular_info              38461 (95.8%)
Concise_description         5671 (14.1%)
Reference                   13964 (34.8%)
WormBase_approved Gene name 25970 (64.7%)
RNAi_result                 23000 (57.3%)
Microarray_results          21073 (52.5%)
SAGE_transcript             19123 (47.6%)




Wormpep data set:
----------------------------

There are 20349 CDS in autoace, 24599 when counting 4250 alternate splice forms.

The 24599 sequences contain 10,848,524 base pairs in total.

Modified entries      172
Deleted entries       42
New entries           66
Reappeared entries    2

Net change  +26
The differnce between the total CDS's of this (24599) and the last build (24575)
does not equal the net change 26 Please investigate! ! 




Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed             10110 (41.1%)     Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed   11766 (47.8%)     Some, but not all exon bases are covered by transcript evidence
Predicted              2723 (11.1%)     No transcriptional evidence at all



Status of entries: Protein Accessions
-------------------------------------
UniProtKB accessions  24402 (99.2%)



Status of entries: Protein_ID's in EMBL
---------------------------------------
Protein_id            24403 (99.2%)



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis elegans entries with WormBase-approved Gene name  24324


Synchronisation with GenBank / EMBL:
------------------------------------

CHROMOSOME_I    sequence AC024793
CHROMOSOME_I    sequence AC024794
CHROMOSOME_I    sequence AC024751
CHROMOSOME_I    sequence AF067219
CHROMOSOME_I    sequence Z72503
CHROMOSOME_I    sequence AL032631
CHROMOSOME_I    sequence Z81522
CHROMOSOME_I    sequence AL132877
CHROMOSOME_II   sequence AF026210
CHROMOSOME_II   sequence AF077540
CHROMOSOME_II   sequence AF040645
CHROMOSOME_II   sequence AF025454
CHROMOSOME_II   sequence U80840
CHROMOSOME_II   sequence AC024746
CHROMOSOME_II   sequence AC006832
CHROMOSOME_II   sequence AF016671
CHROMOSOME_II   sequence U49955
CHROMOSOME_II   sequence U23519
CHROMOSOME_II   sequence U49830
CHROMOSOME_II   sequence U41278
CHROMOSOME_II   sequence U58760
CHROMOSOME_II   sequence U46753
CHROMOSOME_II   sequence U28733
CHROMOSOME_II   sequence U29535
CHROMOSOME_II   sequence U28944
CHROMOSOME_II   sequence U29244
CHROMOSOME_II   sequence U29537
CHROMOSOME_II   sequence U23528
CHROMOSOME_II   sequence U23181
CHROMOSOME_II   sequence U23450
CHROMOSOME_II   sequence U23139
CHROMOSOME_II   sequence AC006678
CHROMOSOME_II   sequence AC006678
CHROMOSOME_II   sequence U39996
CHROMOSOME_II   sequence U23147
CHROMOSOME_II   sequence U39999
CHROMOSOME_II   sequence U23517
CHROMOSOME_II   sequence U21308
CHROMOSOME_II   sequence U28729
CHROMOSOME_II   sequence U23168
CHROMOSOME_II   sequence Z36753
CHROMOSOME_II   sequence Z68317
CHROMOSOME_II   sequence Z50859
CHROMOSOME_II   sequence Z78412
CHROMOSOME_II   sequence Z81534
CHROMOSOME_II   sequence Z48367
CHROMOSOME_II   sequence Z49127
CHROMOSOME_II   sequence Z81544
CHROMOSOME_III  sequence U22832
CHROMOSOME_III  sequence U00052
CHROMOSOME_III  sequence Z34800
CHROMOSOME_III  sequence Z38112
CHROMOSOME_III  sequence U12966
CHROMOSOME_III  sequence U39850
CHROMOSOME_III  sequence U00057
CHROMOSOME_III  sequence U23514
CHROMOSOME_III  sequence U00065
CHROMOSOME_III  sequence U50193
CHROMOSOME_III  sequence U39851
CHROMOSOME_III  sequence U23177
CHROMOSOME_III  sequence U00048
CHROMOSOME_III  sequence U00043
CHROMOSOME_III  sequence U00054
CHROMOSOME_III  sequence U00044
CHROMOSOME_III  sequence U28991
CHROMOSOME_III  sequence AC006679
CHROMOSOME_III  sequence L16621
CHROMOSOME_III  sequence L14331
CHROMOSOME_III  sequence L15188
CHROMOSOME_III  sequence L09634
CHROMOSOME_III  sequence L16622
CHROMOSOME_III  sequence L16559
CHROMOSOME_III  sequence Z11115
CHROMOSOME_III  sequence Z12017
CHROMOSOME_III  sequence Z19157
CHROMOSOME_III  sequence Z27078
CHROMOSOME_III  sequence Z35639
CHROMOSOME_III  sequence Z35640
CHROMOSOME_III  sequence Z50016
CHROMOSOME_IV   sequence AF047658
CHROMOSOME_IV   sequence AF038604
CHROMOSOME_IV   sequence U40801
CHROMOSOME_IV   sequence AF100669
CHROMOSOME_IV   sequence AC024838
CHROMOSOME_IV   sequence AC006833
CHROMOSOME_IV   sequence U97593
CHROMOSOME_IV   sequence Z68296
CHROMOSOME_IV   sequence Z70284
CHROMOSOME_IV   sequence Z70686
CHROMOSOME_IV   sequence Z69383
CHROMOSOME_IV   sequence Z68760
CHROMOSOME_IV   sequence Z68297
CHROMOSOME_IV   sequence Z70682
CHROMOSOME_IV   sequence Z81093
CHROMOSOME_IV   sequence AL021492
CHROMOSOME_IV   sequence AL110479
CHROMOSOME_V    sequence AF024503
CHROMOSOME_V    sequence AF068713
CHROMOSOME_V    sequence U64858
CHROMOSOME_V    sequence U64833
CHROMOSOME_V    sequence U55364
CHROMOSOME_V    sequence Z70757
CHROMOSOME_V    sequence Z74042
CHROMOSOME_V    sequence Z75553
CHROMOSOME_V    sequence Z77668
CHROMOSOME_V    sequence Z81094
CHROMOSOME_X    sequence U41553
CHROMOSOME_X    sequence U50199
CHROMOSOME_X    sequence U42835
CHROMOSOME_X    sequence U40059
CHROMOSOME_X    sequence U53338
CHROMOSOME_X    sequence U39653
CHROMOSOME_X    sequence U41272
CHROMOSOME_X    sequence U39742
CHROMOSOME_X    sequence U40410
CHROMOSOME_X    sequence U41021
CHROMOSOME_X    sequence AC006618
CHROMOSOME_X    sequence U53344
CHROMOSOME_X    sequence U46673
CHROMOSOME_X    sequence U23526
CHROMOSOME_X    sequence U28732
CHROMOSOME_X    sequence U64859
CHROMOSOME_X    sequence U29614
CHROMOSOME_X    sequence U39741
CHROMOSOME_X    sequence U39855
CHROMOSOME_X    sequence Z66563
CHROMOSOME_X    sequence Z79598
CHROMOSOME_X    sequence Z79639
CHROMOSOME_X    sequence U43283
CHROMOSOME_X    sequence AC199239

There are no gaps remaining in the genome sequence
---------------
For more info mail worm@sanger.ac.uk
-===================================================================================-



New Data:
---------

* 22 new microarray datasets were added to WS215, including 355 new experiments and 46 expression clusters. 

* Variation objects now have a stable ID of the form: 'WBVar00000001'

* Brugia pre-release
=------------------=
Augustus gene predictions provided by Erich Schwarz were merged into already existing predictions from TIGR.

original Augustus data available at ftp://caltech.wormbase.org/pub/schwarz/wormbase/Brugia
new Brugia data is available on the WormBase FTP server under WS215

The quality of the Augustus predictions seem to be comparable and slightly better than the TIGR ones:
Average % positive id of the best C.elegans homolog increased from 84% to 85%.
The  percentage of proteins containing PFAM motifs increased from 36% to 40%.

Total number of genes in the new geneset: 19107
Total number of isoforms in the new geneset: 22112

Due to the quality of the B.malayi assembly many genes are partial and do not contain the N'/C'-terminus.
For this reason it is important to respect the phases in the GFF files.

Another geneset using RNAseq data provided by the Pathogen Sequencing Unit of the WTSI  is in preparation and should be merged soon into a future WormBase release.

Additional quality filtering will be done and an updated "final" brugia (TIGR|Augustus) version should be available for WS216. 


Genome sequence updates:
-----------------------

There have been 151 single base changes made to the geneome for this realease.

These changes have been made based on RNAseq data analysis from LaDeana Hillier and Robert Waterston (1).
Source:modENCODE http://www.modencode.org/Waterston.shtml
These changes were verified and confirmed by Gary Williams using N2 illumina data from Matt Berriman and Julie Ahringer.

In the majority of cases 

The net result is an increase of 66bp, the details of which can be seen in the 
Genome sequence composition: and Chromosomal Changes: sections above.


New Fixes:
----------

RNAi - We have remapped the RNAi targets for this build. We were not filtering out the 
highly fragmented hits that occurred when we previously remapped the RNAi objects (WS210).
That is, when many very short alignments occurred close together on the genome our script
was concatenating these splits, much like it would do when it skips over introns.

These hits caused many errant primary and secondary targets. These have been removed from
the Database. 


Known Problems:
---------------


Other Changes:
--------------

Proposed Changes / Forthcoming Data:
-------------------------------------
Pristonchus pacificus
Gene model corrections to 17 genes of interest.

Brugia Gene model correction ~600 loci

Model Changes:
------------------------------------
Added NBCI tax id to ?Species

Removed ?Journal class

Added Tags for grouping ?Analysis objects

Added Sex, Food and duration info to ?Condition

Added Species to ?Rearrangement


-===================================================================================-


Quick installation guide for UNIX/Linux systems
-----------------------------------------------

1. Create a new directory to contain your copy of WormBase,
        e.g. /users/yourname/wormbase

2. Unpack and untar all of the database.*.tar.gz files into
        this directory. You will need approximately 2-3 Gb of disk space.

3. Obtain and install a suitable acedb binary for your system
        (available from www.acedb.org).

4. Use the acedb 'xace' program to open your database, e.g.
        type 'xace /users/yourname/wormbase' at the command prompt.

5. See the acedb website for more information about acedb and
        using xace.

____________  END _____________


Citations

1) Nature 2009 Jun 18;459(7249):927-30. Unlocking the Secrets of the Genome


Bugs/Fuiles to copy over later

Brugia GFF files still being prepared copy over later

pecan alignments being packaged up

wiggle re-dumped