Software Life Cycle: 1. Updating The Development Server
Contents
- 1 Overview
- 2 Document Conventions
- 3 Staging Pipeline Code
- 4 Running the Update Pipeline
- 4.1 Log Files
- 4.2 Executing the Pipeline
- 4.3 Update Steps
- 4.4 Update Steps
- 4.4.1 Compile Gene Resources
- 4.4.2 Compile Orthology Resources
- 4.4.3 Compile Interaction Data
- 4.4.4 Create ePCR databases for available species
- 4.4.5 Build and Load GFF patches
- 4.4.6 Convert GFF2 into GFF3
- 4.4.7 Create a GBrowse-driven genetic map
- 4.4.8 Create a GBrowse-driven physical map
- 4.4.9 Create dump files of common datasets
- 4.4.10 Load the CLUSTALW database
- 4.4.11 Mirror annotation files from Sanger to the FTP site
- 4.5 Compiled File Table
- 4.6 Update Records
Overview
This document describes the process of staging a new release of WormBase on the development server.
The automated staging pipeline consists of:
- a harness that handles logging, error trapping, and basic shared functions
- a suite of modules -- one per step -- that implement the step or make calls to helper scripts
- helper scripts in Perl or shell that assist in implementation
Control of the pipeline: You can use the pipeline in several ways:
- Launch the full pipeline via the control script, the preferred and automated method.
- Run individual steps in the context of the pipeline using control scripts in steps/, useful if the pipeline fails at a specific point.
- Directly run helper scripts outside of the logging facilities of the pipeline, useful if you need to rebuild something quickly.
Document Conventions
The current development server is
wb-dev: wb-dev.oicr.on.ca (FQDN); aka: dev.wormbase.org
When indicated, substitute WSXXX for ${RELEASE}.
System paths referred to in this document:
FTP : /usr/local/ftp/pub/wormbase WORMBASE : /usr/local/wormbase ACEDB : /usr/local/wormbase/acedb
Staging Pipeline Code
The update pipeline code is available in the website-admin module on github:
tharris> git clone git@github.com:WormBase/website-admin.git tharris> cd website-admin/update lib/ -- the shared library suite that handles updates. staging/ -- code related to staging data on the development site. production/ -- code related to the releasing data/code into production.
The contents are:
logs/ -- the logs directory for each step/update bin/ -- Perl scripts for manually launching individual steps. README.txt -- directory listing updatelog.conf -- a configuration file for the update process update.sh -- master script that fires off each step of the pipeline util/ -- various helper scripts for the update process
Running the Update Pipeline
Log Files
The Staging Pipeline creates informative logs for each step of the process. Logs are located at:
/usr/local/wormbase/logs/staging/WSXXX master.log -- Master log tracks all steps; useful for a meta-view of the pipeline. Contains INFO, WARN, ERROR, and FATAL messages. master.err -- Master error log tracks ERROR and FATAL messages encountered across all steps.
Each individual step creates its own log file capturing STDERR and STDOUT containing informative messages from the pipeline. These are useful for tracking progress and triaging problems. For example:
/usr/local/wormbase/logs/staging/WSXXX/build_blast_databases/ step.log -- step-specific log tracking everything from TRACE on up. step.err -- step-specific error log tracking ERROR and FATAL messages. Good place to check if a step breaks.
Executing the Pipeline
A single script fires off all steps of the process. You should run it inside a screen.
tharris> screen tharris> ./stage_via_pipeline.pl WSXXX (to disconnect your screen) tharris> ^a ^d (to resume your screen) tharris> screen -r
Monitor progress of the update by following the master log file:
tharris> tail -f /usr/local/wormbase/logs/staging/WSXXX/master.log
screen command reference
Update Steps
The steps that comprise the pipeline, the script to launch them, and the module that implements are listed below.
step | control script | helper scripts | module |
---|---|---|---|
Mirror a new release | steps/mirror_new_release.pl (manual) | W::U::Staging::MirrorNewRelease | |
Unpack ACeDB | steps/unpack_acedb.pl (manual) | W::U::Staging::UnpackAcedb | |
Create directories | steps/create_directories.pl | W::U::Staging::CreateDirectories | |
Create BLAST databases | steps/create_blast_databases.pl | helpers/create_blastdb_nucleotide.sh, create_blastdb_protein.sh | W::U::Staging::CreateBlastDatabases |
Create BLAT databases | steps/create_blat_databases.pl | W::U::Staging::CreateBlatDatabases | |
Load Genomic GFF databases | steps/load_genomic_gff_databases.pl | W::U::Staging::LoadGenomicGFFDatabases |
- Compile Gene Resources
- Mirror ontology files from Sanger
- Compile ontology resources for the site
- Compile orthology resources
- Compile interaction resources
- Create ePCR databases for select species
- Build and load GFF patches
- Convert GFF2 into GFF3
- Create a GBrowse-driven genetic map
- Create a GBrowse-drive physical map
- Update strains database
- Create dump files of common datasets
- Mirror annotation files from Sanger to the FTP site
- Load CLUSTAL db
Mirror a new release
New releases are mirrored directly from the Hinxton FTP site to the primary WormBase FTP site hosted on wb-dev:/usr/local/ftp. This process is run via cron but can also be run manually.
# Mirror the next incremental release newer than what we already have: ./steps/mirror_new_release.pl # Or mirror a specific release: ./steps/mirror_new_release.pl WS150 // Mirror the WS150 release to /usr/local/ftp/pub/wormbase/releases/WS150
Create necessary directories
Create staging directories for the update process.
Usage : ./steps/create_directories.pl ${RELEASE} Output : Directories in ${WORMBASE}/databases
Unpack Acedb
Unpack AceDB from the new release. Customize the new installation with skeletal files located at /usr/local/wormbase/website/classic/wspec. Files will be unpacked to /usr/local. Please make sure that there is sufficient space in this directory! You will need approximately 25 GB of disk space per release.
via pipeline: ./steps/unpack_acedb.pl ${RELEASE} via helper : helpers/unpack_acedb.sh ${RELEASE} Input : Files staged at ${FTP}/releases/${RELEASE}/species Output : Unpacked AceDB files in ${ACEDB}/wormbase_${RELEASE}
When complete, you should have a new acedb directory containing:
-- database -- wgf -- wquery -- wspec
It is also good to check for a functional db -- try to connect to the acedb via a test script that creates a db handle. It may be necessary to restart the database:
> ps -ax | grep acedb ## to get acedb process number > kill -9 {AceDB proc number} ## stop current acedb process > sudo /etc/init.d/xinetd restart > saceclient localhost -port 2005
Create BLAST databases
Build BLAST databases. We automatically build nucleotide and protein BLAST DBs for species with genomic sequence and conceptual translations. In addition, for C. elegans and C. briggsae, we build blast databases for ESTs and genes.
Usage : ./steps/create_blast_databases.pl ${RELEASE} Input : Genomic sequence and protein FASTA files staged at: ${FTP}/releases/species/${SPECIES}.${RELEASE}.genomic.fa.gz ${FTP}/releases/species/${SPECIES}.${RELEASE}.protein.fa.gz Gene and EST sequences derived from AceDB Output : BLAST databases in ${WORMBASE}/databases/${RELEASE}/blast/${SPECIES}.
Create BLAT databases
Build BLAT databases of genomic sequence.
Usage : ./steps/create_blat_databases.pl ${RELEASE} Input : Genomic sequence FASTA files staged at ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz Output : BLAT .nib files in ${WORMBASE}/databases/${RELEASE}/blat/${SPECIES}
Load genomic GFF annotations
Convert GFF files into Bio::DB::GFF (GFF2) or Bio::DB::SeqFeature::Store (GFF3) databases.
Usage : ./steps/load_genomic_gff_databases.pl ${RELEASE} Input : GFF and FASTA files staged at: GFF : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.gff[2|3].gz DNA : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz
Compile Ontology Resources
Take the mirrored files and compile them into the databases for the ontology searches.
Usage : ./steps/compile_ontology_resources.pl ${RELEASE} Input : OBO files staged at: /usr/local/ftp/pub/wormbase/releases/WSXXX/ONTOLOGY compiled data files from Compile Gene Resources step Output : to ${WORMBASE}/database/${RELEASE}/ontology:
- anatomy_association.RELEASE.wb
- gene_association.RELEASE.wb.ce
- gene_ontology.RELEASE.obo
- name2id.txt
- search_data.txt
- anatomy_ontology.RELEASE.obo
- gene_association.RELEASE.wb.cjp
- id2association_counts.txt
- parent2ids.txt
- gene_association.RELEASE.wb
- gene_association.RELEASE.wb.ppa
- id2name.txt
- phenotype_association.RELEASE.wb
- gene_association.RELEASE.wb.cb
- gene_association.RELEASE.wb.rem
- id2parents.txt
- phenotype_ontology.RELEASE.obo
Update Steps
Compile Gene Resources
Create precompiled gene page files specifically to populate the Phenotype tables.
Usage : ./steps/compile_gene_resource.pl ${RELEASE} Input : AceDB data Output : Files ${WORMBASE}/databases/${RELEASE}/gene
- gene_rnai_pheno.txt
- gene_xgene_pheno.txt
- phenotype_id2name.txt
- rnai_data.txt
- variation_data.txt
Compile Orthology Resources
Create precompiled orthology and disease display and search related files
Usage : ./steps/compile_gene_data.pl ${RELEASE} ./steps/compile_ortholog_data.pl ${RELEASE} ./steps/compile_orthology_resources.pl ${RELEASE} Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files Output : Files ${WORMBASE}/databases/${RELEASE}/orthology
- all_proteins.txt
- disease_page_data.txt
- disease_search_data.txt
- full_disease_data.txt
- gene_association.$RELEASE.wb.ce
- gene_id2go_bp.txt
- gene_id2go_mf.txt
- gene_id2omim_ids.txt
- gene_id2phenotype.txt
- gene_list.txt
- go_id2omim_ids.txt
- go_ids2descendants.txt
- hs_ensembl_id2omim.txt
- hs_proteins.txt
- id2name.txt
- last_processed_gene.txt
- name2id.txt
- omim2disease.txt
- omim_id2all_ortholog_data.txt
- omim_id2disease_desc.txt
- omim_id2disease_name.txt
- omim_id2disease_notes.txt
- omim_id2disease_synonyms.txt
- omim_id2disease.txt
- omim_id2gene_name.txt
- omim_id2go_ids.txt
- omim_id2phenotypes.txt
- omim_reconfigured.txt
- ortholog_other_data_hs_only.txt
- ortholog_other_data.txt
Compile Interaction Data
Create precompiled gene page files specifically to populate interaction listing pages.
Usage : ./steps/compile_interaction_data.pl ${RELEASE} Input : AceDB interaction data Output : Files ${WORMBASE}/databases/${RELEASE}/interaction
- compiled_interaction_data.txt
Create ePCR databases for available species
Build ePCR databases for each species.
Usage : ./steps/create_epcr_databases.pl ${RELEASE} Input : Mirrored genomic sequence files from Sanger to ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz Output : ePCR databases to ${WORMBASE}/databases/${RELEASE}/epcr
Build and Load GFF patches
Create and load number of patches for the c_elegans GFF database, including protein motifs and genetic limits.
Usage : ./steps/load_gff_patches.pl ${RELEASE} Input : Files created to ${FTP}/genomes/c_elegans/genome_feature_tables/GFF2 Output : Files created above.
Convert GFF2 into GFF3
Notes...
Usage: ./steps/convert_gff2_to_gff3.pl ${RELEASE}
Create a GBrowse-driven genetic map
Notes...
Usage: ./steps/load_gmap_gffdb.pl ${RELEASE}
Create a GBrowse-driven physical map
Notes...
Usage: ./steps/load_pmap_gffdb.pl {WSRELEASE}
Create dump files of common datasets
Notes...
Load the CLUSTALW database
Notes...
Usage: ./steps/load_clustal_db.pl {WSRELEASE}
Mirror annotation files from Sanger to the FTP site
Notes...
Usage: ./steps/mirror_annotations.pl {WSRELEASE}