WS289
From WormBaseWiki
Jump to navigationJump to searchNew release of WormBase WS288 WS288 was built by Stavros Diamantakis with the input from the WormBase Community and Team. Thank you! -==============================================================================- -========= FTP site structure =================================================- -==============================================================================- The WS288 build directory includes: species/ DIR - contains a sub dir for each WormBase species (G_SPECIES) species/G_SPECIES DIR - contains a sub dir for each NCBI genome sequencing BioProject (BIOPROJECT) for the species, with the following files: - G_SPECIES.BIOPROJECT.WS288.genomic.fa.gz - Unmasked genomic DNA - G_SPECIES.BIOPROJECT.WS288.genomic_masked.fa.gz - Hard-masked (repeats replaced with Ns) genomic DNA - G_SPECIES.BIOPROJECT.WS288.genomic_softmasked.fa.gz - Soft-masked (repeats lower-cased) genomic DNA - G_SPECIES.BIOPROJECT.WS288.protein.fa.gz - Current live protein set - G_SPECIES.BIOPROJECT.WS288.CDS_transcripts.fa.gz - Spliced cDNA sequence for the CDS portion of protein-coding transcripts - G_SPECIES.BIOPROJECT.WS288.mRNA_transcripts.fa.gz - Spliced cDNA sequence for the full-length (including UTRs) mRNA for transcripts - G_SPECIES.BIOPROJECT.WS288.ncrna_transcripts.fa.gz - Spliced cDNA sequence for non-coding RNA transcripts - G_SPECIES.BIOPROJECT.WS288.pseudogenic_transcripts.fa.gz - Spliced cDNA sequence for pseudogenic transcripts - G_SPECIES.BIOPROJECT.WS288.transposon_transcripts.fa.gz - Spliced cDNA sequence for mRNAs and pseudogenes located in Transposons - G_SPECIES.BIOPROJECT.WS288.transposons.fa.gz - DNA sequence of curated and predicted Transposons - G_SPECIES.BIOPROJECT.WS288.transposon_cds.pep.gz - Protein sequence of curated CDSs associated with Transposons - G_SPECIES.BIOPROJECT.WS288.intergenic_sequences.fa.gz - DNA sequence between pairs of adjacent genes - G_SPECIES.BIOPROJECT.WS288.annotations.gff[2|3].gz - Sequence features in either GFF2 or GFF3 format - G_SPECIES.BIOPROJECT.WS288.protein_annotation.gff3.gz - Sequence features in proteins in GFF3 format - G_SPECIES.BIOPROJECT.WS288.canonical_geneset.gtf.gz - Genes, transcripts and CDSs in GTF (GFF2) format - G_SPECIES.BIOPROJECT.WS288.ests.fa.gz - ESTs and mRNA sequences extracted from the public databases - G_SPECIES.BIOPROJECT.WS288.best_blastp_hits.txt.gz - Best blastp matches to human, fly, yeast, and non-WormBase Uniprot proteins - G_SPECIES.BIOPROJECT.WS288.*pep_package.tar.gz - latest version of the [worm|brig|bren|rema|jap|ppa|brug]pep package (if updated since last release) - annotation/ - contains additional annotations: - G_SPECIES.BIOPROJECT.WS288.confirmed_genes.txt.gz - DNA sequences of all genes confirmed by EST &/or cDNA - G_SPECIES.BIOPROJECT.WS288.cDNA2orf.txt.gz - Latest set of ORF connections to each cDNA (EST, OST, mRNA) - G_SPECIES.BIOPROJECT.WS288.geneIDs.txtgz - list of all current gene identifiers with CGC & molecular names (when known) - G_SPECIES.BIOPROJECT.WS288.PCR_product2gene.txt.gz - Mappings between PCR products and overlapping Genes - G_SPECIES.BIOPROJECT.WS288.*oligo_mapping.txt.gz - Oligo array mapping files - G_SPECIES.BIOPROJECT.WS288.knockout_consortium_alleles.xml.gz - Table of Knockout Consortium alleles - G_SPECIES.BIOPROJECT.WS288.SRA_gene_expression.tar.gz - Tables of gene expression values computed from SRA RNASeq data - G_SPECIES.BIOPROJECT.WS288.TSS.wig.tar.gz - Wiggle plot files of Transcription Start Sites from the papers WBPaper00042246, WBPaper00042529, WBPaper00042354 - G_SPECIES.BIOPROJECT.WS288.repeats.fa..gz - Latest version of the repeat library for the genome, suitable for use with RepeatMasker acedb DIR - Everything needed to generate a local copy of the The Primary database - database.WS288.*.tar.gz - compressed acedb database for new release - models.wrm.WS288 - the latest database schema (also in above database files) - WS288-WS287.dbcomp - log file reporting difference from last release - *Non_C_elegans_BLASTX/ - This directory contains the blastx data for non-elegans species (reduces the size of the main database) MULTI_SPECIES DIR - miscellaneous files with data for multiple species - wormpep_clw.WS288.sql.bz2 - ClustalW protein multiple alignments ONTOLOGY DIR - gene_associations, obo files for (phenotype GO anatomy) and associated association files Release notes on the web: ------------------------- http://www.wormbase.org/about/release_schedule -=====================================================================================- -=========== C. elegans data summary =================================================- -=====================================================================================- Genome version -------------- The version of the C. elegans reference genome included with this release is: Version Name: WBcel235 INSDC accession: GCA_000002985.3 UNSC name: ce11 This version has been present in WormBase since release WS235 C. elegans gene data (49177 genes in total) ---------------------------------------------- Protein-coding (19984 genes): Curated description 5098 (25.5%) Automated description 19874 (99.4%) Human disease association 3388 (17.0%) Approved Gene name 10769 (53.9%) Reference 11827 (59.2%) RNAi results 18344 (91.8%) Microarray results 19776 (99.0%) Expression patterns 19648 (98.3%) Variations 19980 (100.0%) Interaction data 16200 (81.1%) Non-coding RNA and pseudogene (27668 genes): Curated description 219 (0.8%) Automated description 8858 (32.0%) Human disease association 16 (0.1%) Approved Gene name 16526 (59.7%) Reference 5952 (21.5%) RNAi results 861 (3.1%) Microarray results 2267 (8.2%) Expression patterns 899 (3.2%) Variations 27632 (99.9%) Interaction data 941 (3.4%) Uncloned (1525 genes): Curated description 778 (51.0%) Automated description 116 (7.6%) Human disease association 10 (0.7%) Approved Gene name 1524 (99.9%) Reference 1124 (73.7%) RNAi results 0 (0.0%) Microarray results 0 (0.0%) Expression patterns 18 (1.2%) Variations 1184 (77.6%) Interaction data 122 (8.0%) Wormpep data set: ---------------------------- There are 28564 CDSs, from 19981 protein-coding loci The 28564 sequences contain 40758087 base pairs in total. Modified entries 1 Deleted entries 5 New entries 2 Reappeared entries 1 Net change -2 C. elegans Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 23532 (82.4%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 4045 (14.2%) Some, but not all exon bases are covered by transcript evidence Predicted 987 (3.5%) No coverage by mRNA/EST/RNASeq evidence C. elegans Operons Stats ------------------------ Live Operons 1385 Genes in Operons 3688 C. elegans GO annotation status ------------------------------- GO_codes - used for assigning evidence IBA Inferred by Biological aspect of Ancestor IC Inferred by Curator IDA Inferred from Direct Assay IEA Inferred from Electronic Annotation IEP Inferred from Expression Pattern IGI Inferred from Genetic Interaction IKR Inferred from Key Residues IMP Inferred from Mutant Phenotype IPI Inferred from Physical Interaction IRD Inferred from Rapid Divergence ISM Inferred from Sequence Model ISO Inferred from Sequence Orthology ISS Inferred from Sequence (or Structural) Similarity NAS Non-traceable Author Statement ND No Biological Data available RCA Inferred from Reviewed Computational Analysis TAS Traceable Author Statement Number of gene<->GO_term associations 126735 Breakdown by annotation provider: WormBase 18749 UniProt 52222 GO_Central 28770 InterPro 20895 IntAct 2399 GOC 2329 RHEA 1156 SynGO 68 CACAO 35 ParkinsonsUK-UCL 30 MGI 29 BHF-UCL 23 HGNC-UCL 14 CAFA 7 DisProt 5 ARUK-UCL 2 HGNC 1 FlyBase 1 Breakdown by evidence code: IEA 67031 Interpro2GO 20895 Other 46136 non-IEA 59704 EXP 2 HDA 386 HEP 320 IBA 28749 IC 117 IDA 7874 IEP 169 IGI 4671 IKR 5 IMP 10426 IPI 4274 ISM 9 ISO 1 ISS 1906 NAS 180 ND 429 RCA 13 TAS 173 Genes Stats: Genes with GO_term connections 14217 Non-IEA-only annotation 1380 IEA-only annotation 4879 Both IEA and non-IEA annotations 7958 GO_term Stats: Distinct GO_terms connected to Genes 7030 Associated by non-IEA only 3900 Associated by IEA only 835 Associated by both IEA and non-IEA 2295 -=============================================================================- -=========== Other core species data summary =================================- -=============================================================================- Assembly versions ---------------- Brugia malayi B_malayi-4.0 (project PRJNA10729, current since WS252) Caenorhabditis brenneri C_brenneri-6.0.1b (project PRJNA20035, current since WS227) Caenorhabditis briggsae CB4 (project PRJNA10731, current since WS254) Caenorhabditis japonica C_japonica-7.0.1 (project PRJNA12591, current since WS227) Caenorhabditis remanei C_remanei-15.0.1 (project PRJNA53967, current since WS185) Onchocerca volvulus O_volvulus_Cameroon_v3 (project PRJEB513, current since WS241) Pristionchus pacificus P_pacificus-El_Paco (project PRJNA12644, current since WS263) Strongyloides ratti S_ratti_ED321_v5_0_4 (project PRJEB125, current since WS247) Trichuris muris T_muris-TMUE3.1 (project PRJEB126, current since WS264) Approved gene symbols --------------------- Brugia malayi 4163 Caenorhabditis brenneri 8528 Caenorhabditis briggsae 8002 Caenorhabditis japonica 6824 Caenorhabditis remanei 8492 Onchocerca volvulus 3213 Pristionchus pacificus 4341 Strongyloides ratti 109 Trichuris muris 0 Gene counts ----------- Brugia malayi 11687 (10936 coding) Caenorhabditis brenneri 33291 (30705 coding) Caenorhabditis briggsae 23202 (21024 coding) Caenorhabditis japonica 32410 (29935 coding) Caenorhabditis remanei 59263 (57627 coding) Onchocerca volvulus 12605 (12109 coding) Pristionchus pacificus 26343 (26342 coding) Strongyloides ratti 12977 (12464 coding) Trichuris muris 15755 (14995 coding) Brugia malayi Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 8148 (53.6%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 5704 (37.5%) Some, but not all exon bases are covered by transcript evidence Predicted 1352 (8.9%) No coverage by mRNA/EST/RNASeq evidence Caenorhabditis brenneri Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 1593 (5.2%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 5706 (18.6%) Some, but not all exon bases are covered by transcript evidence Predicted 23426 (76.2%) No coverage by mRNA/EST/RNASeq evidence Caenorhabditis briggsae Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 16526 (68.8%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 4559 (19.0%) Some, but not all exon bases are covered by transcript evidence Predicted 2923 (12.2%) No coverage by mRNA/EST/RNASeq evidence Caenorhabditis japonica Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 9390 (26.1%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 12173 (33.8%) Some, but not all exon bases are covered by transcript evidence Predicted 14413 (40.1%) No coverage by mRNA/EST/RNASeq evidence Caenorhabditis remanei Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 962 (3.1%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 5665 (18.0%) Some, but not all exon bases are covered by transcript evidence Predicted 24830 (78.9%) No coverage by mRNA/EST/RNASeq evidence Onchocerca volvulus Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 6186 (50.6%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 5581 (45.7%) Some, but not all exon bases are covered by transcript evidence Predicted 457 (3.7%) No coverage by mRNA/EST/RNASeq evidence Pristionchus pacificus Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 382 (1.4%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 4992 (18.9%) Some, but not all exon bases are covered by transcript evidence Predicted 21059 (79.7%) No coverage by mRNA/EST/RNASeq evidence Strongyloides ratti Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 877 (7.0%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 2342 (18.8%) Some, but not all exon bases are covered by transcript evidence Predicted 9265 (74.2%) No coverage by mRNA/EST/RNASeq evidence Trichuris muris Gene model confirmation status (based on the EST/mRNA/RNASeq evidence) ------------------------------------------------------------ Confirmed 6262 (41.8%) Every base of every exon has transcription evidence (mRNA/EST/RNASeq) Partially_confirmed 3901 (26.0%) Some, but not all exon bases are covered by transcript evidence Predicted 4832 (32.2%) No coverage by mRNA/EST/RNASeq evidence -==============================================================================- -=========== News for this release ============================================- -==============================================================================- New data sets -------------- New/updated reference genomes ------------------------------------ Proposed Changes / Forthcoming Data ------------------------------------ Model Changes -------------- Model changes for this release are documented here: http://wiki.wormbase.org/index.php/WS288_Models.wrm For more information mail help@wormbase.org ____________ END _____________