Difference between revisions of "Downloads"

From WormBaseWiki
Jump to navigationJump to search
 
(33 intermediate revisions by 9 users not shown)
Line 1: Line 1:
 +
[[File:Warning.jpg|100px]]
 +
<span style="color:red">
 +
'''Warning:''' The WormBase ftp site is currently undergoing reorganisation so some file/links may be absent/not working.
 +
</span>
 +
[[File:Warning.jpg|100px]]
 +
 +
This page is going to be replace in the near future with an auto generated page that reflects the real data storage.
 +
 +
Please visit [ftp://ftp.wormbase.org/pub/wormbase/releases/current-www.wormbase.org-release Directly] for consistent file browsing
 +
 +
Please email help@wormbase.org if you have any trouble finding what you want.
 +
 +
__TOC__
 +
 
WormBase maintains a public FTP site where you can find many commonly requested files and datasets, the WormBase software and prepackaged databases:
 
WormBase maintains a public FTP site where you can find many commonly requested files and datasets, the WormBase software and prepackaged databases:
  
Line 7: Line 21:
 
== Published datasets hosted at WormBase ==
 
== Published datasets hosted at WormBase ==
  
For easier distribution of data, WormBase offers to host published datasets. These can be found in the [ftp://ftp.wormbase.org/pub/wormbase/datasets datasets] directory on our FTP site. If you would like to host your data at WormBase, please contact [[User:Tharris Todd Harris (harris@cshl.edu)].
+
For easier distribution of data, WormBase offers to host published datasets. These can be found in the [ftp://ftp.wormbase.org/pub/wormbase/datasets datasets] directory on our FTP site. If you would like to host your data at WormBase, please contact [Todd Harris (harris@cshl.edu)].
  
== Genomic annotations in GFF format ==
+
== Genomic annotations in GFF format ==
  
WormBase provides raw annotations for integration into your own local database. These genomic annotations are distributed in the [http://www.sanger.ac.uk/Software/formats/GFF/ GFF file format] (both versions 1 and 2). Such files can be loaded into a relational schema using the Perl module [[Mining_WormBase_with_Bio::DB::GFF Bio::DB::GFF]]. Following the release of a new database from our team at the Wellcome Trust Sanger Institute, some additional post-processing of the GFF files occurs in order to create the variant that we use at WormBase. We use this final file to power the genome browser, dump sequences, generate images on the Gene Summary pages, etc.
+
WormBase provides raw annotations for integration into your own local database. These genomic annotations are distributed in the [http://www.sanger.ac.uk/Software/formats/GFF/ GFF file format] (both versions 1 and 2). Such files can be loaded into a relational schema using the Perl module [[Mining WormBase with Bio::DB::GFF Bio::DB::GFF|Mining_WormBase_with_Bio::DB::GFF Bio::DB::GFF]]. Following the release of a new database from our team at the Wellcome Trust Sanger Institute, some additional post-processing of the GFF files occurs in order to create the variant that we use at WormBase. We use this final file to power the genome browser, dump sequences, generate images on the Gene Summary pages, etc.  
  
You may wish to check the [[Release_notes|Release_notes]] for any last minute bugs or issues with the database.
+
You may wish to check the [[Release Schedule|Release schedule]] for any last minute bugs or issues with the database.  
  
=== GFF2 ===
+
=== GFF2 ===
  
C. elegans current release [ftp://ftp.wormbase.org/pub/wormbase/acedb/current_release/CHROMOSOMES pre-processed] / [ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF2/current.gff2.gz post-processed.gff2.gz]
+
C. elegans current release [ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/gff/c_elegans.current.annotations.gff2.gz post-processed.gff2.gz]  
  
C. briggsae current release [ftp://ftp.wormbase.org/pub/wormbase/genomes/briggsae/genome_feature_tables/GFF2/current.gff.gz post-processed.gff2.gz] (waba lines may need to be changed to target Sequence:XYZ instead of CDS:XYZ)
+
C. briggsae current release [ftp://ftp.wormbase.org/pub/wormbase/species/c_briggsae/gff/c_briggsae.current.annotations.gff2.gz post-processed.gff2.gz] (waba lines may need to be changed to target Sequence:XYZ instead of CDS:XYZ)  
  
C. remanei current release [ftp://ftp.wormbase.org/pub/wormbase/genomes/remanei/preliminary_analysis/ preliminary_analysis/]
+
C. remanei current release [ftp://ftp.wormbase.org/pub/wormbase/species/c_remanei/gff/c_remanei.current.annotations.gff2.gz post-processed.gff2.gz]
  
=== GFF3 ===
+
C. brenneri current release [ftp://ftp.wormbase.org/pub/wormbase/species/c_brenneri/gff/c_brenneri.current.annotations.gff2.gz post-processed.gff2.gz]
 +
 
 +
C. japonica current release [ftp://ftp.wormbase.org/pub/wormbase/species/c_japonica/gff/c_japonica.current.annotations.gff2.gz post-processed.gff2.gz]
 +
 
 +
P. pacificus current release [ftp://ftp.wormbase.org/pub/wormbase/species/p_pacificus/gff/p_pacificus.current.annotations.gff2.gz post-processed.gff2.gz]
 +
 
 +
=== GFF3 ===
 +
 
 +
C. angaria current release [ftp://ftp.wormbase.org/pub/wormbase/species/c_angaria/gff/c_angaria.current.annotations.gff3.gz current.gff3.gz]
 +
 
 +
B. malayi current release [ftp://ftp.wormbase.org/pub/wormbase/species/b_malayi/gff/b_malayi.current.annotations.gff3.gz current.gff3.gz]
 +
 
 +
H. contortus current release [ftp://ftp.wormbase.org/pub/wormbase/species/h_contortus/gff/h_contortus.current.annotations.gff3.gz current.gff3.gz]
 +
 
 +
M. hapla current release [ftp://ftp.wormbase.org/pub/wormbase/species/m_hapla/gff/m_hapla.current.annotations.gff3.gz current.gff3.gz]
 +
 
 +
M. incognita current release [ftp://ftp.wormbase.org/pub/wormbase/genomes/m_incognita/gff/m_incognita.current.annotations.gff3.gz current.gff3.gz]
 +
 
 +
t_spiralis current release [ftp://ftp.wormbase.org/pub/wormbase/genomes/t_spiralis/gff/t_spiralis.current.annotations.gff3.gz current.gff3.gz]
  
C. elegans current release [ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3/current.gff3.gz current.gff3.gz]
 
  
 
The GFF3 format is part of the [http://www.sequenceontology.org Sequence Ontology]. You can read about it in the [http://www.sequenceontology.org/gff3.shtml GFF3 specification]. There is also an online [http://dev.wormbase.org/db/validate_gff3/validate_gff3_online GFF3 validator] for checking the validity of GFF3-format files.
 
The GFF3 format is part of the [http://www.sequenceontology.org Sequence Ontology]. You can read about it in the [http://www.sequenceontology.org/gff3.shtml GFF3 specification]. There is also an online [http://dev.wormbase.org/db/validate_gff3/validate_gff3_online GFF3 validator] for checking the validity of GFF3-format files.
  
== Sequence Data in FASTA Format ==
+
== Sequence Data in FASTA Format ==
  
C. elegans current release [ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/sequences/dna]
+
<b><u>DNA</u></b><br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/sequence/genomic/c_elegans.current.genomic.fa.gz Current C. elegans] <br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_briggsae/sequence/genomic/c_briggsae.current.genomic.fa.gz Current C. briggsae]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_remanei/sequence/genomic/c_remanei.current.genomic.fa.gz Current C. remanei]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_brenneri/sequence/genomic/c_brenneri.current.genomic.fa.gz Current C. brenneri]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_japonica/sequence/genomic/c_japonica.current.genomic.fa.gz Current C. japonica]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/p_pacificus/sequence/genomic/p_pacificus.current.genomic.fa.gz Current P. pacificus]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/h_bacteriophora/sequence/genomic/h_bacteriophora.current.genomic.fa.gz Current H. bacteriophora]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/b_malayi/sequence/genomic/b_malayi.current.genomic.fa.gz Current B. malayi]<br>
  
C. briggsae current release [ftp://ftp.wormbase.org/pub/wormbase/genomes/briggsae/sequences/dna]
+
<b><u>Protein</u></b><br>
 
+
[ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/sequence/protein/c_elegans.current.protein.fa.gz Current C. elegans protein data]<br>
For details of the C.elegans protein set see these [http://www.sanger.ac.uk/Projects/C_elegans/WORMBASE/current/wormpep.shtml WORMPEP pages]. This gives a description of the various files included in the WORMPEP release as well as links to download current or previous versions.
+
[ftp://ftp.wormbase.org/pub/wormbase/species/c_briggsae/sequence/protein/c_briggsae.current.protein.fa.gz Current C. briggsae protein data]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_remanei/sequence/protein/c_remanei.current.protein.fa.gz Current C. remanei protein data]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_brenneri/sequence/protein/c_brenneri.current.protein.fa.gz Current C. brenneri protein data]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/c_japonica/sequence/protein/c_japonica.current.protein.fa.gz Current C. japonica protein data]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/p_pacificus/sequence/protein/p_pacificus.current.protein.fa.gz Current P. pacificus protein data]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/h_bacteriophora/sequence/protein/h_bacteriophora.current.protein.fa.gz Current H. bacteriophora protein data]<br>
 +
[ftp://ftp.wormbase.org/pub/wormbase/species/b_malayi/sequence/protein/b_malayi.current.protein.fa.gz Current B. malayi protein data]
  
 
== Microarray data ==
 
== Microarray data ==
  
All microarray data available in WormBase: ftp://caltech.wormbase.org/pub/annots/microarray/.
+
Up-to-date mapping of microarray probes to WormBase genes for [ftp://ftp.wormbase.org/pub/wormbase/genomes/c_elegans/annotations/microarray/affy_oligo_mapping.gz Affymetrix] [ftp://ftp.wormbase.org/pub/wormbase/genomes/c_elegans/annotations/microarray/agil_oligo_mapping.gz Agilent] [ftp://ftp.wormbase.org/pub/wormbase/genomes/c_elegans/annotations/microarray/gsc_oligo_mapping.gz WashU GSC] [ftp://ftp.wormbase.org/pub/wormbase/genomes/c_elegans/annotations/smd_oligo_mapping.gz SMD] chips.
  
Microarray data from specific publications or for a set of probes: http://elbrus.caltech.edu/cgi-bin/igor/microarraytools/dump_microarray_data_mysql.cgi.
+
== RNAi clone mapping ==
  
Up-to-date mapping of microarray probes to WormBase genes for [ftp://ftp.wormbase.org/pub/wormbase/acedb/current_release/affy_oligo_mapping.gz Affymetrix] [ftp://ftp.wormbase.org/pub/wormbase/acedb/current_release/agil_oligo_mapping.gz Agilent] [ftp://ftp.wormbase.org/pub/wormbase/acedb/current_release/gsc_oligo_mapping.gz WashU GSC] chips.
+
Up-to-date mapping of Julie Ahringer RNAi library clones to current WormBase gene models: ftp://caltech.wormbase.org/pub/annots/rnai/
  
 
== Database dumps ==
 
== Database dumps ==
Line 60: Line 105:
 
http://www.acedb.org
 
http://www.acedb.org
 
|}
 
|}
 
== Prepackaged databases ==
 
 
For your convenience, WormBase offers prepackaged databases that make running your own copy of WorrmBase much simpler. These can be found in the [ftp://ftp.wormbase.org/pub/wormbase/mirror/database_tarballs database_tarballs] directory on the FTP site.
 
 
These prepacked databases include AceDB, C. elegans and C. briggsae GFF databases for MySQL and Blast/BLAT databases. You will probably want to use the [ftp://ftp.wormbase.org/pub/wormbase/mirror/database_tarballs/live_release current production version] of the databases. We also provide access to the [ftp://ftp.wormbase.org/pub/wormbase/mirror/database_tarballs/development_release development version] of the database.
 
  
 
== Literature citations ==
 
== Literature citations ==
Line 75: Line 114:
 
All C. elegans citations
 
All C. elegans citations
 
|
 
|
[ftp://ftp.wormbase.org/pub/wormbase/misc/literature/current-wormbase-literature.endnote.gz current-wormbase-literature.endnote.gz]
+
[http://tazendra.caltech.edu/~postgres/michael/wbpapers.endnote current-wormbase-literature.endnote.gz]
 
|-
 
|-
 
|
 
|
Line 82: Line 121:
 
[http://www.wormbook.org/endnote/references.txt references.txt]
 
[http://www.wormbook.org/endnote/references.txt references.txt]
 
|}
 
|}
 +
 +
 +
 +
[[Category:User Guide]]

Latest revision as of 22:33, 24 October 2011

Warning.jpg Warning: The WormBase ftp site is currently undergoing reorganisation so some file/links may be absent/not working. Warning.jpg

This page is going to be replace in the near future with an auto generated page that reflects the real data storage.

Please visit Directly for consistent file browsing

Please email help@wormbase.org if you have any trouble finding what you want.

WormBase maintains a public FTP site where you can find many commonly requested files and datasets, the WormBase software and prepackaged databases:

ftp://ftp.wormbase.org/pub/wormbase/

Please see the individual READMEs in each directory describing the contents of the directory. For convenience, select files are directly linked below.

Published datasets hosted at WormBase

For easier distribution of data, WormBase offers to host published datasets. These can be found in the datasets directory on our FTP site. If you would like to host your data at WormBase, please contact [Todd Harris (harris@cshl.edu)].

Genomic annotations in GFF format

WormBase provides raw annotations for integration into your own local database. These genomic annotations are distributed in the GFF file format (both versions 1 and 2). Such files can be loaded into a relational schema using the Perl module Mining_WormBase_with_Bio::DB::GFF Bio::DB::GFF. Following the release of a new database from our team at the Wellcome Trust Sanger Institute, some additional post-processing of the GFF files occurs in order to create the variant that we use at WormBase. We use this final file to power the genome browser, dump sequences, generate images on the Gene Summary pages, etc.

You may wish to check the Release schedule for any last minute bugs or issues with the database.

GFF2

C. elegans current release post-processed.gff2.gz

C. briggsae current release post-processed.gff2.gz (waba lines may need to be changed to target Sequence:XYZ instead of CDS:XYZ)

C. remanei current release post-processed.gff2.gz

C. brenneri current release post-processed.gff2.gz

C. japonica current release post-processed.gff2.gz

P. pacificus current release post-processed.gff2.gz

GFF3

C. angaria current release current.gff3.gz

B. malayi current release current.gff3.gz

H. contortus current release current.gff3.gz

M. hapla current release current.gff3.gz

M. incognita current release current.gff3.gz

t_spiralis current release current.gff3.gz


The GFF3 format is part of the Sequence Ontology. You can read about it in the GFF3 specification. There is also an online GFF3 validator for checking the validity of GFF3-format files.

Sequence Data in FASTA Format

DNA
Current C. elegans
Current C. briggsae
Current C. remanei
Current C. brenneri
Current C. japonica
Current P. pacificus
Current H. bacteriophora
Current B. malayi

Protein
Current C. elegans protein data
Current C. briggsae protein data
Current C. remanei protein data
Current C. brenneri protein data
Current C. japonica protein data
Current P. pacificus protein data
Current H. bacteriophora protein data
Current B. malayi protein data

Microarray data

Up-to-date mapping of microarray probes to WormBase genes for Affymetrix Agilent WashU GSC SMD chips.

RNAi clone mapping

Up-to-date mapping of Julie Ahringer RNAi library clones to current WormBase gene models: ftp://caltech.wormbase.org/pub/annots/rnai/

Database dumps

Databases and software

The official WormBase software

ftp://ftp.wormbase.org/pub/wormbase/software

AceDB, the database that drives WormBase

http://www.acedb.org

Literature citations

Literature citations, pre-formatted for import into the Endnote citation manager.

All C. elegans citations

current-wormbase-literature.endnote.gz

WormBook citations ONLY

references.txt