WS215
From WormBaseWiki
Release Letter
New release of WormBase WS215, Wormpep215 and Wormrna215 Thu May 27 18:15:29 BST 2010 WS215 was built by Paul Davis ====================================================================== This directory includes: i) database.WS215.*.tar.gz - compressed data for new release ii) models.wrm.WS215 - the latest database schema (also in above database files) iii) CHROMOSOMES/subdir - contains 3 files (DNA, GFF & AGP per chromosome) iv) WS215-WS214.dbcomp - log file reporting difference from last release v) wormpep215.tar.gz - full Wormpep distribution corresponding to WS215 vi) wormrna215.tar.gz - latest WormRNA release containing non-coding RNA's in the genome vii) confirmed_genes.WS215.gz - DNA sequences of all genes confirmed by EST &/or cDNA viii) cDNA2orf.WS215.gz - Latest set of ORF connections to each cDNA (EST, OST, mRNA) ix) gene_interpolated_map_positions.WS215.gz - Interpolated map positions for each coding/RNA gene x) clone_interpolated_map_positions.WS215.gz - Interpolated map positions for each clone xi) best_blastp_hits.WS215.gz - for each C. elegans WormPep protein, lists Best blastp match to human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins. xii) best_blastp_hits_brigprot.WS215.gz - for each C. briggsae protein, lists Best blastp match to human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins. xiii) geneIDs.WS215.gz - list of all current gene identifiers with CGC & molecular names (when known) xiv) PCR_product2gene.WS215.gz - Mappings between PCR products and overlapping Genes Release notes on the web: ------------------------- http://www.wormbase.org/wiki/index.php/Release_Schedule Genome sequence composition: ---------------------------- WS215 WS214 change ---------------------------------------------- a 32367418 32367418 +0 c 17780787 17780763 +24 g 17756985 17756943 +42 t 32367086 32367086 +0 n 0 0 +0 - 0 0 +0 Total 100272276 100272210 +66 Chromosomal Changes: -------------------- Chromosome: I 221371 221370 0 -> 221371 221371 1 232041 232041 1 -> 232042 232041 0 742850 742849 0 -> 742850 742850 1 4184639 4184639 1 -> 4184640 4184639 0 7537823 7537822 0 -> 7537823 7537823 1 10436460 10436460 1 -> 10436461 10436460 0 11522478 11522477 0 -> 11522478 11522478 1 14624752 14624751 0 -> 14624753 14624753 1 Chromosome: II 213640 213639 0 -> 213640 213640 1 1577787 1577786 0 -> 1577788 1577788 1 1602997 1602996 0 -> 1602999 1602999 1 1926841 1926840 0 -> 1926844 1926844 1 2678739 2678738 0 -> 2678743 2678743 1 2796454 2796453 0 -> 2796459 2796459 1 2879047 2879046 0 -> 2879053 2879053 1 2925948 2925947 0 -> 2925955 2925955 1 3521238 3521237 0 -> 3521246 3521246 1 4754160 4754160 1 -> 4754169 4754168 0 4763297 4763297 1 -> 4763305 4763304 0 4831918 4831917 0 -> 4831925 4831925 1 5028913 5028913 1 -> 5028921 5028920 0 5079266 5079265 0 -> 5079273 5079273 1 5203317 5203316 0 -> 5203325 5203325 1 5271461 5271460 0 -> 5271470 5271470 1 5661841 5661841 1 -> 5661851 5661850 0 5667684 5667683 0 -> 5667693 5667693 1 5670255 5670254 0 -> 5670265 5670265 1 5707678 5707677 0 -> 5707689 5707689 1 5820841 5820841 1 -> 5820853 5820852 0 5938982 5938981 0 -> 5938993 5938993 1 5974548 5974547 0 -> 5974560 5974560 1 5998651 5998650 0 -> 5998664 5998664 1 6193651 6193650 0 -> 6193665 6193665 1 6200179 6200178 0 -> 6200194 6200194 1 6256081 6256080 0 -> 6256097 6256097 1 6500693 6500692 0 -> 6500710 6500710 1 6539463 6539462 0 -> 6539481 6539481 1 6700729 6700728 0 -> 6700748 6700748 1 6749218 6749218 1 -> 6749238 6749237 0 7479115 7479114 0 -> 7479134 7479134 1 7547483 7547482 0 -> 7547503 7547503 1 7642038 7642058 21 -> 7642059 7642081 23 7739394 7739394 1 -> 7739417 7739416 0 7743764 7743866 103 -> 7743786 7743885 100 7745770 7745770 1 -> 7745789 7745788 0 7866906 7866905 0 -> 7866924 7866924 1 7871213 7871213 1 -> 7871232 7871231 0 9234924 9234924 1 -> 9234942 9234941 0 9437403 9437402 0 -> 9437420 9437420 1 11207622 11207621 0 -> 11207640 11207640 1 11384800 11384799 0 -> 11384819 11384819 1 11717738 11717737 0 -> 11717758 11717758 1 12556602 12556601 0 -> 12556623 12556623 1 Chromosome: III 661018 661017 0 -> 661018 661018 1 835907 835906 0 -> 835908 835908 1 3366749 3366748 0 -> 3366751 3366751 1 4059722 4059722 1 -> 4059725 4059724 0 5105426 5105425 0 -> 5105428 5105428 1 5333261 5333260 0 -> 5333264 5333264 1 5443182 5443181 0 -> 5443186 5443186 1 5467693 5467693 1 -> 5467698 5467697 0 5531120 5531119 0 -> 5531124 5531124 1 6033358 6033358 1 -> 6033363 6033362 0 6193202 6193201 0 -> 6193206 6193206 1 6322256 6322256 1 -> 6322261 6322260 0 6412493 6412492 0 -> 6412497 6412497 1 6468411 6468410 0 -> 6468416 6468416 1 6757605 6757604 0 -> 6757611 6757611 1 6888304 6888303 0 -> 6888311 6888311 1 7357296 7357295 0 -> 7357304 7357304 1 7587398 7587558 161 -> 7587407 7587565 159 7587845 7587845 1 -> 7587852 7587851 0 7882072 7882072 1 -> 7882078 7882077 0 8042710 8042709 0 -> 8042715 8042715 1 8068404 8068403 0 -> 8068410 8068410 1 8130860 8130859 0 -> 8130867 8130867 1 8472517 8472516 0 -> 8472525 8472525 1 8552918 8552917 0 -> 8552927 8552927 1 8583371 8583370 0 -> 8583381 8583381 1 8888186 8888185 0 -> 8888197 8888197 1 8976444 8976444 1 -> 8976456 8976455 0 9199647 9199646 0 -> 9199658 9199658 1 9350790 9350789 0 -> 9350802 9350802 1 10460403 10460402 0 -> 10460416 10460416 1 10483640 10483640 1 -> 10483654 10483653 0 10509770 10509769 0 -> 10509783 10509783 1 10551516 10551516 1 -> 10551530 10551529 0 10552638 10552637 0 -> 10552651 10552651 1 10553066 10553065 0 -> 10553080 10553080 1 Chromosome: IV 1515191 1515190 0 -> 1515191 1515191 1 3382096 3382096 1 -> 3382097 3382096 0 4583606 4583605 0 -> 4583606 4583606 1 4801204 4801203 0 -> 4801205 4801205 1 5208046 5208045 0 -> 5208048 5208048 1 5405347 5405346 0 -> 5405350 5405350 1 6006168 6006167 0 -> 6006172 6006172 1 9195668 9195667 0 -> 9195673 9195673 1 9840852 9840851 0 -> 9840858 9840858 1 10414831 10414830 0 -> 10414838 10414838 1 10894242 10894241 0 -> 10894250 10894250 1 11050311 11050311 1 -> 11050320 11050319 0 12091259 12091258 0 -> 12091267 12091267 1 12433585 12433585 1 -> 12433594 12433593 0 13188309 13188308 0 -> 13188317 13188317 1 13791001 13791001 1 -> 13791010 13791009 0 15965559 15965558 0 -> 15965567 15965567 1 Chromosome: V 653489 653489 1 -> 653489 653488 0 3556390 3556389 0 -> 3556389 3556389 1 8233595 8233594 0 -> 8233595 8233595 1 8757755 8757754 0 -> 8757756 8757756 1 9113932 9113931 0 -> 9113934 9113934 1 9669525 9669524 0 -> 9669528 9669528 1 11495161 11495161 1 -> 11495165 11495164 0 12810476 12810475 0 -> 12810479 12810479 1 13506615 13506614 0 -> 13506619 13506619 1 13665874 13665873 0 -> 13665879 13665879 1 Chromosome: X 425164 425164 1 -> 425164 425163 0 2234457 2234457 1 -> 2234456 2234455 0 3413858 3413857 0 -> 3413856 3413856 1 3812192 3812191 0 -> 3812191 3812191 1 4565293 4565292 0 -> 4565293 4565293 1 4603455 4603455 1 -> 4603456 4603455 0 5170173 5170172 0 -> 5170173 5170173 1 5462747 5462746 0 -> 5462748 5462748 1 5526944 5526943 0 -> 5526946 5526946 1 5590456 5590455 0 -> 5590459 5590459 1 5599499 5599498 0 -> 5599503 5599503 1 6059577 6059577 1 -> 6059582 6059581 0 6075483 6075482 0 -> 6075487 6075487 1 6283378 6283377 0 -> 6283383 6283383 1 7139446 7139446 1 -> 7139452 7139451 0 7145340 7145339 0 -> 7145345 7145345 1 7436795 7436794 0 -> 7436801 7436801 1 7986172 7986171 0 -> 7986179 7986179 1 8325359 8325359 1 -> 8325367 8325366 0 8596018 8596062 45 -> 8596025 8596069 45 8601957 8601956 0 -> 8601964 8601964 1 8639860 8639860 1 -> 8639868 8639867 0 9235232 9235231 0 -> 9235239 9235239 1 11423706 11423705 0 -> 11423714 11423714 1 14574517 14574517 1 -> 14574526 14574525 0 14621700 14621699 0 -> 14621708 14621708 1 14634338 14634337 0 -> 14634347 14634347 1 17228979 17228978 0 -> 17228989 17228989 1 17714411 17714410 0 -> 17714422 17714422 1 Gene data set (Live C.elegans genes 40156) ------------------------------------------ Molecular_info 38461 (95.8%) Concise_description 5671 (14.1%) Reference 13964 (34.8%) WormBase_approved Gene name 25970 (64.7%) RNAi_result 23000 (57.3%) Microarray_results 21073 (52.5%) SAGE_transcript 19123 (47.6%) Wormpep data set: ---------------------------- There are 20349 CDS in autoace, 24599 when counting 4250 alternate splice forms. The 24599 sequences contain 10,848,524 base pairs in total. Modified entries 172 Deleted entries 42 New entries 66 Reappeared entries 2 Net change +26 The differnce between the total CDS's of this (24599) and the last build (24575) does not equal the net change 26 Please investigate! ! Status of entries: Confidence level of prediction (based on the amount of transcript evidence) ------------------------------------------------- Confirmed 10110 (41.1%) Every base of every exon has transcription evidence (mRNA, EST etc.) Partially_confirmed 11766 (47.8%) Some, but not all exon bases are covered by transcript evidence Predicted 2723 (11.1%) No transcriptional evidence at all Status of entries: Protein Accessions ------------------------------------- UniProtKB accessions 24402 (99.2%) Status of entries: Protein_ID's in EMBL --------------------------------------- Protein_id 24403 (99.2%) Gene <-> CDS,Transcript,Pseudogene connections ---------------------------------------------- Caenorhabditis elegans entries with WormBase-approved Gene name 24324 Synchronisation with GenBank / EMBL: ------------------------------------ CHROMOSOME_I sequence AC024793 CHROMOSOME_I sequence AC024794 CHROMOSOME_I sequence AC024751 CHROMOSOME_I sequence AF067219 CHROMOSOME_I sequence Z72503 CHROMOSOME_I sequence AL032631 CHROMOSOME_I sequence Z81522 CHROMOSOME_I sequence AL132877 CHROMOSOME_II sequence AF026210 CHROMOSOME_II sequence AF077540 CHROMOSOME_II sequence AF040645 CHROMOSOME_II sequence AF025454 CHROMOSOME_II sequence U80840 CHROMOSOME_II sequence AC024746 CHROMOSOME_II sequence AC006832 CHROMOSOME_II sequence AF016671 CHROMOSOME_II sequence U49955 CHROMOSOME_II sequence U23519 CHROMOSOME_II sequence U49830 CHROMOSOME_II sequence U41278 CHROMOSOME_II sequence U58760 CHROMOSOME_II sequence U46753 CHROMOSOME_II sequence U28733 CHROMOSOME_II sequence U29535 CHROMOSOME_II sequence U28944 CHROMOSOME_II sequence U29244 CHROMOSOME_II sequence U29537 CHROMOSOME_II sequence U23528 CHROMOSOME_II sequence U23181 CHROMOSOME_II sequence U23450 CHROMOSOME_II sequence U23139 CHROMOSOME_II sequence AC006678 CHROMOSOME_II sequence AC006678 CHROMOSOME_II sequence U39996 CHROMOSOME_II sequence U23147 CHROMOSOME_II sequence U39999 CHROMOSOME_II sequence U23517 CHROMOSOME_II sequence U21308 CHROMOSOME_II sequence U28729 CHROMOSOME_II sequence U23168 CHROMOSOME_II sequence Z36753 CHROMOSOME_II sequence Z68317 CHROMOSOME_II sequence Z50859 CHROMOSOME_II sequence Z78412 CHROMOSOME_II sequence Z81534 CHROMOSOME_II sequence Z48367 CHROMOSOME_II sequence Z49127 CHROMOSOME_II sequence Z81544 CHROMOSOME_III sequence U22832 CHROMOSOME_III sequence U00052 CHROMOSOME_III sequence Z34800 CHROMOSOME_III sequence Z38112 CHROMOSOME_III sequence U12966 CHROMOSOME_III sequence U39850 CHROMOSOME_III sequence U00057 CHROMOSOME_III sequence U23514 CHROMOSOME_III sequence U00065 CHROMOSOME_III sequence U50193 CHROMOSOME_III sequence U39851 CHROMOSOME_III sequence U23177 CHROMOSOME_III sequence U00048 CHROMOSOME_III sequence U00043 CHROMOSOME_III sequence U00054 CHROMOSOME_III sequence U00044 CHROMOSOME_III sequence U28991 CHROMOSOME_III sequence AC006679 CHROMOSOME_III sequence L16621 CHROMOSOME_III sequence L14331 CHROMOSOME_III sequence L15188 CHROMOSOME_III sequence L09634 CHROMOSOME_III sequence L16622 CHROMOSOME_III sequence L16559 CHROMOSOME_III sequence Z11115 CHROMOSOME_III sequence Z12017 CHROMOSOME_III sequence Z19157 CHROMOSOME_III sequence Z27078 CHROMOSOME_III sequence Z35639 CHROMOSOME_III sequence Z35640 CHROMOSOME_III sequence Z50016 CHROMOSOME_IV sequence AF047658 CHROMOSOME_IV sequence AF038604 CHROMOSOME_IV sequence U40801 CHROMOSOME_IV sequence AF100669 CHROMOSOME_IV sequence AC024838 CHROMOSOME_IV sequence AC006833 CHROMOSOME_IV sequence U97593 CHROMOSOME_IV sequence Z68296 CHROMOSOME_IV sequence Z70284 CHROMOSOME_IV sequence Z70686 CHROMOSOME_IV sequence Z69383 CHROMOSOME_IV sequence Z68760 CHROMOSOME_IV sequence Z68297 CHROMOSOME_IV sequence Z70682 CHROMOSOME_IV sequence Z81093 CHROMOSOME_IV sequence AL021492 CHROMOSOME_IV sequence AL110479 CHROMOSOME_V sequence AF024503 CHROMOSOME_V sequence AF068713 CHROMOSOME_V sequence U64858 CHROMOSOME_V sequence U64833 CHROMOSOME_V sequence U55364 CHROMOSOME_V sequence Z70757 CHROMOSOME_V sequence Z74042 CHROMOSOME_V sequence Z75553 CHROMOSOME_V sequence Z77668 CHROMOSOME_V sequence Z81094 CHROMOSOME_X sequence U41553 CHROMOSOME_X sequence U50199 CHROMOSOME_X sequence U42835 CHROMOSOME_X sequence U40059 CHROMOSOME_X sequence U53338 CHROMOSOME_X sequence U39653 CHROMOSOME_X sequence U41272 CHROMOSOME_X sequence U39742 CHROMOSOME_X sequence U40410 CHROMOSOME_X sequence U41021 CHROMOSOME_X sequence AC006618 CHROMOSOME_X sequence U53344 CHROMOSOME_X sequence U46673 CHROMOSOME_X sequence U23526 CHROMOSOME_X sequence U28732 CHROMOSOME_X sequence U64859 CHROMOSOME_X sequence U29614 CHROMOSOME_X sequence U39741 CHROMOSOME_X sequence U39855 CHROMOSOME_X sequence Z66563 CHROMOSOME_X sequence Z79598 CHROMOSOME_X sequence Z79639 CHROMOSOME_X sequence U43283 CHROMOSOME_X sequence AC199239 There are no gaps remaining in the genome sequence --------------- For more info mail worm@sanger.ac.uk -===================================================================================- New Data: --------- * 22 new microarray datasets were added to WS215, including 355 new experiments and 46 expression clusters. * Variation objects now have a stable ID of the form: 'WBVar00000001' * Brugia pre-release =------------------= Augustus gene predictions provided by Erich Schwarz were merged into already existing predictions from TIGR. original Augustus data available at ftp://caltech.wormbase.org/pub/schwarz/wormbase/Brugia new Brugia data is available on the WormBase FTP server under WS215 The quality of the Augustus predictions seem to be comparable and slightly better than the TIGR ones: Average % positive id of the best C.elegans homolog increased from 84% to 85%. The percentage of proteins containing PFAM motifs increased from 36% to 40%. Total number of genes in the new geneset: 19107 Total number of isoforms in the new geneset: 22112 Due to the quality of the B.malayi assembly many genes are partial and do not contain the N'/C'-terminus. For this reason it is important to respect the phases in the GFF files. Another geneset using RNAseq data provided by the Pathogen Sequencing Unit of the WTSI is in preparation and should be merged soon into a future WormBase release. Additional quality filtering will be done and an updated "final" brugia (TIGR|Augustus) version should be available for WS216. Genome sequence updates: ----------------------- There have been 151 single base changes made to the geneome for this realease. These changes have been made based on RNAseq data analysis from Ladina Hillier and confirmed by Gary Williams using N2 illumina data from Matt Berriman and Julie Ahringer The net result is an increase of 66bp, the details of which can be seen in the Genome sequence composition: and Chromosomal Changes: sections above. New Fixes: ---------- RNAi - We have remapped the RNAi targets for this build. We were not filtering out the highly fragmented hits that occurred when we previously remapped the RNAi objects (WS210). That is, when many very short alignments occurred close together on the genome our script was concatenating these splits, much like it would do when it skips over introns. These hits caused many errant primary and secondary targets. These have been removed from the Database. Known Problems: --------------- Other Changes: -------------- Proposed Changes / Forthcoming Data: ------------------------------------- Pristonchus pacificus Gene model corrections to 17 genes of interest. Brugia Gene model correction ~600 loci Model Changes: ------------------------------------ Added NBCI tax id to ?Species Removed ?Journal class Added Tags for grouping ?Analysis objects Added Sex, Food and duration info to ?Condition Added Species to ?Rearrangement -===================================================================================- Quick installation guide for UNIX/Linux systems ----------------------------------------------- 1. Create a new directory to contain your copy of WormBase, e.g. /users/yourname/wormbase 2. Unpack and untar all of the database.*.tar.gz files into this directory. You will need approximately 2-3 Gb of disk space. 3. Obtain and install a suitable acedb binary for your system (available from www.acedb.org). 4. Use the acedb 'xace' program to open your database, e.g. type 'xace /users/yourname/wormbase' at the command prompt. 5. See the acedb website for more information about acedb and using xace. ____________ END _____________
Bugs/Fuiles to copy over later
Brugia GFF files still being prepared copy over later
pecan alignments being packaged up
wiggle re-dumped