WS218

From WormBaseWiki
Jump to: navigation, search


Release Notes

New release of WormBase WS218, Wormpep218 and Wormrna218 Tue Aug 24
18:41:26 BST 2010


WS218 was built by Paul Davis
-===================================================================================-

The WS218 build directory includes:
genomes DIR - contains a sub dir for each WormBase species with
sequence, gff, and agp data
genomes/b_malayi: - genome_feature_tables/ sequences/
genomes/c_brenneri: - genome_feature_tables/ sequences/
genomes/c_briggsae: - genome_feature_tables/ sequences/
genomes/c_elegans: - annotation/ genome_feature_tables/ sequences/
genomes/c_japonica: - genome_feature_tables/ sequences/
genomes/c_remanei: - genome_feature_tables/ sequences/
genomes/h_bacteriophora: - genome_feature_tables/ sequences/
genomes/h_contortus: - genome_feature_tables/ sequences/
genomes/m_hapla: - genome_feature_tables/ sequences/
genomes/m_incognita: - sequences/
genomes/p_pacificus: - genome_feature_tables/ sequences/
*annotation/ - contains additional annotations
i) confirmed_genes.WS218.gz - DNA sequences of all genes confirmed by
EST &/or cDNA
ii) cDNA2orf.WS218.gz - Latest set of ORF connections to each cDNA
(EST, OST, mRNA)
iii) geneIDs.WS218.gz - list of all current gene identifiers with CGC
& molecular names (when known)
iv) PCR_product2gene.WS218.gz - Mappings between PCR products and
overlapping Genes
v) oligo_mapping.gz - V
*genome_feature_tables/ - contains the main .gff files and
supplementary .gff data
*sequences/ - contains dna/ protein/ rna/ sub dirs
sequences/protein - WormBase protein set for species + history etc.
vi) wormpep218.tar.gz - full Wormpep distribution corresponding to WS218
vii) wormrna218.tar.gz - latest WormRNA release containing non-coding
RNA's in the genome
viii) best_blastp_hits_species.WS218.gz - for each C. elegans WormPep
protein, lists Best blastp match to
human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
sequences/dna - WormBase dna data genomic sequence (raw, soft_masked
masked), agp
ix) intergenic_sequences.dna.gz
sequences/rna - WormBase rna gene data.
acedb DIR - Everything needed to generate a local copy of the The
Primary database
x) database.WS218.*.tar.gz - compressed acedb database for new release
xi) models.wrm.WS218 - the latest database schema (also in above
database files)
xii) WS218-WS217.dbcomp - log file reporting difference from last release
*Non_C_elegans_BLASTX/ - This directory contains the blastx data for
non-elegans species
(reduces the size of the main database)
COMPARATIVE_ANALYSIS DIR - compara.tar.bz2 wormpep217_clw.sql.bz2
ONTOLOGY DIR - gene_associations, obo files for (phenotype GO anatomy)
and associated association files


Release notes on the web:
-------------------------
http://www.wormbase.org/wiki/index.php/Release_Schedule




C. elegans Synchronisation with GenBank / EMBL:
------------------------------------

No synchronisation issues


C. elegans Chromosomal Changes:
--------------------
There are no changes to the chromosome sequences in this release.


C. elegans Gene data set (Live C.elegans genes 40112)
------------------------------------------
Molecular_info 38431 (95.8%)
Concise_description 5680 (14.2%)
Reference 14075 (35.1%)
WormBase_approved Gene name 26051 (64.9%)
RNAi_result 22872 (57%)
Microarray_results 18005 (44.9%)
SAGE_transcript 19141 (47.7%)


C. elegans

Wormpep data set:
----------------------------

There are 20403 CDS in autoace, 24761 when counting 4358 alternate
splice forms.

The 24761 sequences contain 10,896,867 base pairs in total.

Modified entries 23
Deleted entries 18
New entries 74
Reappeared entries 0

Net change +56

Caenorhabditis elegans Genome sequence composition:
----------------------------

WS218 WS217 change
----------------------------------------------
a 32367418 32367418 +0
c 17780787 17780787 +0
g 17756985 17756985 +0
t 32367086 32367086 +0
n 0 0 +0
- 0 0 +0

Total 100272276 100272276 +0


Pristionchus pacificus Genome sequence composition:
----------------------------
169822619 total
a 41799168
c 31168435
g 31196239
t 41802890
- 0
n 23855887


Caenorhabditis remanei Genome sequence composition:
----------------------------
145500347 total
a 42927857
c 26293828
g 26276020
t 42923178
- 0
n 7079464


Caenorhabditis japonica Genome sequence composition:
----------------------------
163282347 total
a 39053092
c 25603225
g 25576971
t 39126103
- 0
n 33922956


Caenorhabditis briggsae Genome sequence composition:
----------------------------
108478630 total
a 33004189
c 19675861
g 19707411
t 33049803
- 0
n 3041366


Caenorhabditis brenneri Genome sequence composition:
----------------------------
190829385 total
a 52330865
c 32919987
g 32964675
t 52271637
- 0
n 20342221




Tier II Gene counts
---------------------------------------------
pristionchus Gene count 29638 (Coding 29639)
remanei Gene count 32431 (Coding 31476)
heterorhabditis Gene count 0 (Coding 0)
japonica Gene count 27177 (Coding 25870)
briggsae Gene count 23044 (Coding 21997)
brenneri Gene count 32295 (Coding 30663)
---------------------------------------------

-------------------------------------------------
Caenorhabditis elegans Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount
of transcript evidence)
-------------------------------------------------
Confirmed 11606 (46.9%) Every base of every exon has transcription
evidence (mRNA, EST etc.)
Partially_confirmed 10980 (44.3%) Some, but not all exon bases are
covered by transcript evidence
Predicted 2175 ( 8.8%) No transcriptional evidence at all
24761



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis elegans entries with WormBase-approved Gene name 24418




-------------------------------------------------
Pristionchus pacificus Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount
of transcript evidence)
-------------------------------------------------
Confirmed 425 (1.4%) Every base of every exon has transcription
evidence (mRNA, EST etc.)
Partially_confirmed 5309 (17.9%) Some, but not all exon bases are
covered by transcript evidence
Predicted 23905 (80.7%) No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Pristionchus pacificus entries with WormBase-approved Gene name 2785




-------------------------------------------------
Caenorhabditis remanei Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount
of transcript evidence)
-------------------------------------------------
Confirmed 955 (3.0%) Every base of every exon has transcription
evidence (mRNA, EST etc.)
Partially_confirmed 5662 (18.0%) Some, but not all exon bases are
covered by transcript evidence
Predicted 24859 (79.0%) No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis remanei entries with WormBase-approved Gene name 5452




-------------------------------------------------
Caenorhabditis japonica Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount
of transcript evidence)
-------------------------------------------------
Confirmed 1182 (4.6%) Every base of every exon has transcription
evidence (mRNA, EST etc.)
Partially_confirmed 4974 (19.2%) Some, but not all exon bases are
covered by transcript evidence
Predicted 19714 (76.2%) No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis japonica entries with WormBase-approved Gene name 4783




-------------------------------------------------
Caenorhabditis briggsae Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount
of transcript evidence)
-------------------------------------------------
Confirmed 52 (0.2%) Every base of every exon has transcription
evidence (mRNA, EST etc.)
Partially_confirmed 856 (3.9%) Some, but not all exon bases are
covered by transcript evidence
Predicted 21089 (95.9%) No transcriptional evidence at all



Status of entries: Protein Accessions
-------------------------------------
UniProtKB accessions 21709 (98.7%)

Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis briggsae entries with WormBase-approved Gene name 5486




-------------------------------------------------
Caenorhabditis brenneri Protein Stats:
-------------------------------------------------
Status of entries: Confidence level of prediction (based on the amount
of transcript evidence)
-------------------------------------------------
Confirmed 1511 (4.9%) Every base of every exon has transcription
evidence (mRNA, EST etc.)
Partially_confirmed 5632 (18.4%) Some, but not all exon bases are
covered by transcript evidence
Predicted 23520 (76.7%) No transcriptional evidence at all



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis brenneri entries with WormBase-approved Gene name 3086


C. elegans Operons Stats
---------------------------------------------
Description: These exist as closely spaced gene clusters similar to
bacterial operons
---------------------------------------------
| Live Operons 1267 |
| Genes in Operons 3268 |



GO Annotation Stats WS218
--------------------------------------

GO_codes - used for assigning evidence
--------------------------------------
IC Inferred by Curator
IDA Inferred from Direct Assay
IEA Inferred from Electronic Annotation
IEP Inferred from Expression Pattern
IGI Inferred from Genetic Interaction
IMP Inferred from Mutant Phenotype
IPI Inferred from Physical Interaction
ISS Inferred from Sequence (or Structural) Similarity
NAS Non-traceable Author Statement
NDNo Biological Data available
RCA ?
TAS Traceable Author Statement
------------------------------------------------

Total number of Gene::GO connections: 262284

Genes Stats:
----------------
Genes with GO_term connections 88744
IEA GO_code present 82544
non-IEA GO_code present 6196

Source of the mapping data
Source: *RNAi (GFF mapping overlaps) 23141
*citace 2117
*Inherited (motif & phenotype) 15002

GO_terms Stats:
---------------
Total No. GO_terms 30465
GO_terms connected to Genes 3229
GO annotations connected with IEA 1835
GO annotations connected with non-IEA 1391
Breakdown IC - 2 IDA - 349 ISS - 122
IEP - 9 IGI - 113 IMP - 706
IPI - 67 NAS - 1 ND - 1
RCA - 0 TAS - 21


-===================================================================================-


Useful Stats:
---------

Genes with Sequence and CGC name
WS218 46010 (24418 elegans / 5486 briggsae / 5452 remanei / 4783
japonica / 3086 brenneri / 2785 pristionchus)


-===================================================================================-




New Data:
---------

==================================================
WormBase release WS218 contains preliminary data from the
Caenorhabditis sp. PS1010
sequencing project (E. Schwartz et al. - in press).

Included are genome and proteome sequences as well as gene annotation
and computational analysis.

The assembly was provided by CalTech and consists of 33588 supercontigs.
Augustus (M Stanke, University of Goettingen) was used by E Schwarz
(CalTech) to predict 22667 genes.

This data can be found ftp.sanger.ac.uk/pub/wormbase/WS218/genomes/c_an

==================================================


Genome sequence updates:
-----------------------


New Fixes:
----------


Known Problems:
---------------


Other Changes:
--------------

Proposed Changes / Forthcoming Data:
-------------------------------------

WS219 model changes
-------------------

?Molecule - fix ?Phenotype tags missed from original submission +
additions (Karen Y.)

?Strain - Extended_genotype - signed off on the call, but had errors I
discovered on testing. (Mary Ann T.)
Dropped the XREF back to Variation

?Variation removal of obsolete/unused tags (Mary Ann T., Jolene F.)

?Phenotype - Migration of data from "Not" tags continues need
additional tags for storing phenotypes scored but not observed (Wen C.)


Models.diff (Simplified diff output.)
------------------------------------------
Tags added to models:
?Strain
> Extended_genotype ?Variation

?Phenotype //added by Wen for Not observed phenotypes
> Not_in_RNAi ?RNAi XREF Phenotype_not_observed
> Not_in_Variation ?Variation XREF Phenotype_not_observed
> Not_in_Strain ?Strain XREF Phenotype_not_observed
> Not_in_Transgene ?Transgene XREF Phenotype_not_observed
> Not_in_Rearrangement ?Rearrangement XREF Phenotype_not_observed

XREFs from ?Phenotype above ?RNAi, ?Variation, ?Strain, ?Transgene,
?Rearrangement
> Phenotype_not_observed ?Phenotype XREF Not_in_Rearrangement
> #Phenotype_info
> Phenotype_not_observed ?Phenotype XREF Not_in_Strain #Phenotype_info
> Phenotype_not_observed ?Phenotype XREF Not_in_Transgene #Phenotype_info
> Phenotype_not_observed ?Phenotype XREF Not_in_RNAi #Phenotype_info
> Phenotype_not_observed ?Phenotype XREF Not_in_Variation #Phenotype_info

?Molecule
> Molecule_use ?Text #Evidence
> Remark ?Text #Evidence


Tags removed from models:

?Variation_name
< Name CGC_name UNIQUE ?Variation_name XREF CGC_name_for

?Variation
< ?Variation_name CGC_name_for ?Variation XREF CGC_name

< Recessive
< Semi_dominant
< Dominant
< Partially_penetrant Text // percentage of animals displaying phenotype
< Completely_penetrant
< Temperature_sensitive Heat_sensitive Text #Evidence
< Cold_sensitive Text #Evidence
< Loss_of_function UNIQUE Haplo_insufficient #Evidence
< Hypomorph #Evidence
< Amorph #Evidence
< Uncharacterised_loss_of_function #Evidence
< Gain_of_function UNIQUE Dominant_negative #Evidence
< Hypermorph #Evidence
< Neomorph #Evidence
< Uncharacterised_gain_of_function #Evidence
< Maternal Strictly_maternal
< With_maternal_effect
< Paternal

Model corrections

?Molecule
< Affects_phenotype_of Variation ?Variation #Evidence
< Strain ?Strain #Evidence
< Transgene ?Transgene #Evidence
< RNAi ?RNAi #Evidence
---
> Affects_phenotype_of Variation ?Variation ?Phenotype #Evidence
> Strain ?Strain ?Phenotype #Evidence
> Transgene ?Transgene ?Phenotype #Evidence
> RNAi ?RNAi ?Phenotype #Evidence

Model Changes:
------------------------------------




For more info mail help@wormbase.org
-===================================================================================-



Quick installation guide for UNIX/Linux systems
-----------------------------------------------

1. Create a new directory to contain your copy of WormBase,
e.g. /users/yourname/wormbase

2. Unpack and untar all of the database.*.tar.gz files into
this directory. You will need approximately 2-3 Gb of disk space.

3. Obtain and install a suitable acedb binary for your system
(available from www.acedb.org).

4. Use the acedb 'xace' program to open your database, e.g.
type 'xace /users/yourname/wormbase' at the command prompt.

5. See the acedb website for more information about acedb and
using xace.

____________ END _____________

Bugs