Difference between revisions of "WS217"

From WormBaseWiki
Jump to navigationJump to search
 
Line 245: Line 245:
  
  
For more info mail worm@sanger.ac.uk
+
For more info mail help@wormbase.org
 
-===================================================================================-
 
-===================================================================================-
  

Latest revision as of 11:10, 21 December 2011

Release Letter

New release of WormBase WS217, Wormpep217 and Wormrna217 Wed Jul 28 09:25:17 BST 2010


WS217 was built by Gary Williams
-===================================================================================-
This directory includes:
i)   database.WS217.*.tar.gz    -   compressed data for new release
ii)  models.wrm.WS217           -   the latest database schema (also in above database files)
iii) CHROMOSOMES/subdir         -   contains 3 files (DNA, GFF & AGP per chromosome)
iv)  WS217-WS216.dbcomp         -   log file reporting difference from last release
v)   wormpep217.tar.gz          -   full Wormpep distribution corresponding to WS217
vi)   wormrna217.tar.gz          -   latest WormRNA release containing non-coding RNA's in the genome
vii)  confirmed_genes.WS217.gz   -   DNA sequences of all genes confirmed by EST &/or cDNA
viii) cDNA2orf.WS217.gz           -   Latest set of ORF connections to each cDNA (EST, OST, mRNA)
ix)   gene_interpolated_map_positions.WS217.gz    - Interpolated map positions for each coding/RNA gene
x)    clone_interpolated_map_positions.WS217.gz   - Interpolated map positions for each clone
xi)   best_blastp_hits.WS217.gz  - for each C. elegans WormPep protein, lists Best blastp match to
                            human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
xii)  best_blastp_hits_brigprot.WS217.gz   - for each C. briggsae protein, lists Best blastp match to
                                     human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins.
xiii) geneIDs.WS217.gz   - list of all current gene identifiers with CGC & molecular names (when known)
xiv)  PCR_product2gene.WS217.gz   - Mappings between PCR products and overlapping Genes


Release notes on the web:
-------------------------
http://www.wormbase.org/wiki/index.php/Release_Schedule




Synchronisation with GenBank / EMBL:
------------------------------------

CHROMOSOME_I	sequence AF067219
CHROMOSOME_II	sequence U29244
Genome sequence composition:
----------------------------

       	WS217       	WS216      	change
----------------------------------------------
a    	32367418	32367418	  +0
c    	17780787	17780787	  +0
g    	17756985	17756985	  +0
t    	32367086	32367086	  +0
n    	0       	0       	  +0
-    	0       	0       	  +0

Total	100272276	100272276	  +0


Chromosomal Changes:
--------------------
There are no changes to the chromosome sequences in this release.


Gene data set (Live C.elegans genes 40183)
------------------------------------------
Molecular_info              38499 (95.8%)
Concise_description         5669 (14.1%)
Reference                   14039 (34.9%)
WormBase_approved Gene name 25996 (64.7%)
RNAi_result                 22928 (57.1%)
Microarray_results          21075 (52.4%)
SAGE_transcript             19147 (47.6%)




Wormpep data set:
----------------------------

There are 20387 CDS in autoace, 24705 when counting 4318 alternate splice forms.

The 24705 sequences contain 10,879,267 base pairs in total.

Modified entries      47
Deleted entries       14
New entries           67
Reappeared entries    2

Net change  +55




Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed             10162 (41.1%)	Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed   11800 (47.8%)	Some, but not all exon bases are covered by transcript evidence
Predicted              2743 (11.1%)	No transcriptional evidence at all



Status of entries: Protein Accessions
-------------------------------------
UniProtKB accessions  24505 (99.2%)



Status of entries: Protein_ID's in EMBL
---------------------------------------
Protein_id            24505 (99.2%)



Gene <-> CDS,Transcript,Pseudogene connections
----------------------------------------------
Caenorhabditis elegans entries with WormBase-approved Gene name  24360



C. elegans Operons Stats
---------------------------------------------
Description: These exist as closely spaced gene clusters similar to bacterial operons
---------------------------------------------
| Live Operons        1267                |
| Genes in Operons    3268                |



GO Annotation Stats WS217
--------------------------------------

GO_codes - used for assigning evidence
--------------------------------------
IC  Inferred by Curator
IDA Inferred from Direct Assay
IEA Inferred from Electronic Annotation
IEP Inferred from Expression Pattern
IGI Inferred from Genetic Interaction
IMP Inferred from Mutant Phenotype
IPI Inferred from Physical Interaction
ISS Inferred from Sequence (or Structural) Similarity
NAS Non-traceable Author Statement
NDNo Biological Data available
RCA ?
TAS Traceable Author Statement
------------------------------------------------

Total number of Gene::GO connections:  262140

Genes Stats:
----------------
Genes with GO_term connections         88755  
           IEA GO_code present         82558  
       non-IEA GO_code present         6194  

Source of the mapping data             
Source: *RNAi (GFF mapping overlaps)   23034  
        *citace                        2046  
        *Inherited (motif & phenotype) 15016  

GO_terms Stats:
---------------
Total No. GO_terms                     30460  
GO_terms connected to Genes            3195  
GO annotations connected with IEA      1833  
GO annotations connected with non-IEA  1359  
   Breakdown  IC - 2   IDA - 331   ISS - 123 
             IEP - 9   IGI - 114   IMP - 694 
             IPI - 63  NAS - 1     ND  - 1  
             RCA - 0   TAS - 21   


------------------------------------------------

Tier II Gene counts
---------------------------------------------
pristionchus Gene count 29638 (Coding 29639)
remanei Gene count 32431 (Coding 31476)
heterorhabditis Gene count 0 (Coding 0)
japonica Gene count 27177 (Coding 25870)
briggsae Gene count 23044 (Coding 21997)
brenneri Gene count 32288 (Coding 30663)
---------------------------------------------




-===================================================================================-



New Data:
---------
WGS Data

  Data from two WGS projects has been submitted to WormBase:
  57 alleles from Sarin et al PMID 20439776 (all of biological interest) .
  2723 alleles from Flibotte et al PMID 20439774 (1633 of biological interest).

  We are establishing a pipeline for the submission of the large
  quantities of WGS data we are expecting so this data is the initial
  representation which may change in time.


Orthology and OMIM

  Orthology predictions to 50 eukaryotes were included based on EnsEMBL
  release 58 and the frozen WormBase release WS210. In addition
  information on inherited diseases in human was updated from OMIM.

  Based on user feedback, the WormBase GFF2 files include the public
  name of variations and RNAi experiments. In addition non-coding RNAs
  show now the the same information as the coding equivalents.

Blast databases updated

  ipi_human
  flybase
  yeast


Genome sequence updates:
-----------------------

None

New Fixes:
----------

None

Known Problems:
---------------

None

Other Changes:
--------------

None

Proposed Changes / Forthcoming Data:
-------------------------------------


Model Changes:
------------------------------------

Added Brief_id  to ?Position_Matrix for Xiaodong and ?Molecule class for Karen 


For more info mail help@wormbase.org
-===================================================================================-


Quick installation guide for UNIX/Linux systems
-----------------------------------------------

1. Create a new directory to contain your copy of WormBase,
	e.g. /users/yourname/wormbase

2. Unpack and untar all of the database.*.tar.gz files into
	this directory. You will need approximately 2-3 Gb of disk space.

3. Obtain and install a suitable acedb binary for your system
	(available from www.acedb.org).

4. Use the acedb 'xace' program to open your database, e.g.
	type 'xace /users/yourname/wormbase' at the command prompt.

5. See the acedb website for more information about acedb and
	using xace.

____________  END _____________