Software Life Cycle: 1. Updating The Development Server

From WormBaseWiki
Jump to navigationJump to search

Overview

This document describes the process of staging a new release of WormBase on the development server.

The automated staging pipeline consists of:

  • a harness that handles logging, error trapping, and basic shared functions
  • a suite of modules -- one per step -- that implement the step or make calls to helper scripts
  • helper scripts in Perl or shell that assist in implementation

Control of the pipeline: You can use the pipeline in several ways:

  • Launch the full pipeline via the control script, the preferred and automated method.
  • Run individual steps in the context of the pipeline using control scripts in steps/, useful if the pipeline fails at a specific point.
  • Directly run helper scripts outside of the logging facilities of the pipeline, useful if you need to rebuild something quickly.

Document Conventions

The current development server is

 wb-dev: wb-dev.oicr.on.ca (FQDN); aka: dev.wormbase.org

When indicated, substitute WSXXX for ${RELEASE}.

System paths referred to in this document:

      FTP : /usr/local/ftp/pub/wormbase
 WORMBASE : /usr/local/wormbase
    ACEDB : /usr/local/wormbase/acedb

Staging Pipeline Code

The update pipeline code is available in the website-admin module on github:

tharris> git clone git@github.com:WormBase/website-admin.git
tharris> cd website-admin/update

 lib/                  -- the shared library suite that handles updates.
 staging/          -- code related to staging data on the development site. 
 production/     -- code related to the releasing data/code into production.

The contents are:

  logs/            -- the logs directory for each step/update
  bin/              -- Perl scripts for manually launching individual steps.
  README.txt -- directory listing
  updatelog.conf  -- a configuration file for the update process
  update.sh  -- master script that fires off each step of the pipeline
  util/       -- various helper scripts for the update process

Running the Update Pipeline

Log Files

The Staging Pipeline creates informative logs for each step of the process. Logs are located at:

 /usr/local/wormbase/logs/staging_updates/WSXXX

The master log is useful for a meta-view of the process. It can be found at:

 /usr/local/wormbase/logs/staging_updates/WSXXX/master.log

Each individual step creates its own log file capturing STDERR and STDOUT containing informative messages from the pipeline. These are useful for tracking progress and triaging problems. For example:

 /usr/local/wormbase/logs/staging_updates/WSXXX/build_blast_databases/
      step.log   -- STDOUT; and progress of the step.
      step.err    -- STDERR;  errors and warning generated during the step.

Dry Run

A single shell script fires off all steps of the process. You should run it inside a screen.

 tharris> screen
 tharris> ./stage_via_pipeline.pl WSXXX
   (to disconnect your screen)
 tharris> ^a ^d
   (to resume your screen)
 tharris> screen -r   
 

Monitor progress of the update by following the master log file:

tharris> tail -f /usr/local/wormbase/logs/staging/WSXXX/master.log


 screen command reference

Update Steps

The steps that comprise the pipeline, the script to launch them, and the module that implements are listed below.

stepcontrol scripthelper scriptsmodule
Mirror a new release steps/mirror_new_release.pl (manual)
W::U::Staging::MirrorNewRelease
Unpack ACeDB steps/unpack_acedb.pl (manual)
W::U::Staging::UnpackAcedb
Create directories steps/create_directories.pl
W::U::Staging::CreateDirectories
Create BLAST databases steps/create_blast_databases.pl
helpers/create_blastdb_nucleotide.sh, create_blastdb_protein.sh W::U::Staging::CreateBlastDatabases
Create BLAT databases steps/create_blat_databases.pl
W::U::Staging::CreateBlatDatabases
Load Genomic GFF databases steps/load_genomic_gff_databases.pl
W::U::Staging::LoadGenomicGFFDatabases


  • Compile Gene Resources
  • Mirror ontology files from Sanger
  • Compile ontology resources for the site
  • Compile orthology resources
  • Compile interaction resources
  • Create ePCR databases for select species
  • Build and load GFF patches
  • Convert GFF2 into GFF3
  • Create a GBrowse-driven genetic map
  • Create a GBrowse-drive physical map
  • Update strains database
  • Create dump files of common datasets
  • Mirror annotation files from Sanger to the FTP site
  • Load CLUSTAL db


Mirror a new release

New releases are mirrored directly from the Hinxton FTP site to the primary WormBase FTP site hosted on wb-dev:/usr/local/ftp. This process is run via cron but can also be run manually.

 # Mirror the next incremental release newer than what we already have:
 ./steps/mirror_new_release.pl

 # Or mirror a specific release: 
 ./steps/mirror_new_release.pl WS150   // Mirror the WS150 release to /usr/local/ftp/pub/wormbase/releases/WS150

Create necessary directories

Create staging directories for the update process.

  Usage : ./steps/create_directories.pl ${RELEASE}
 Output : Directories in ${WORMBASE}/databases

Unpack Acedb

Unpack AceDB from the new release. Customize the new installation with skeletal files located at /usr/local/wormbase/website/classic/wspec. Files will be unpacked to /usr/local. Please make sure that there is sufficient space in this directory! You will need approximately 25 GB of disk space per release.

via pipeline: ./steps/unpack_acedb.pl ${RELEASE}
via helper : helpers/unpack_acedb.sh ${RELEASE}
  Input : Files staged at ${FTP}/releases/${RELEASE}/species
 Output : Unpacked AceDB files in ${ACEDB}/wormbase_${RELEASE} 

When complete:

 > a new acedb directory should have been completed: wormbase_WS{RELEASE} containing:
   -- database
   -- wgf
   -- wquery
   -- wspec

It is also good to check for a functional db -- try to connect to the acedb via a test script that creates a db handle. It may be necessary to restart the database:

> ps -ax | grep acedb ## to get acedb process number
> kill -9 {AceDB proc number} ## stop current acedb process
> sudo /etc/init.d/xinetd restart 
> saceclient localhost -port 2005

Create BLAST databases

Build BLAST databases. We automatically build nucleotide and protein BLAST DBs for species with genomic sequence and conceptual translations. In addition, for C. elegans and C. briggsae, we build blast databases for ESTs and genes.

  Usage : ./steps/create_blast_databases.pl ${RELEASE}
  Input : Genomic sequence and protein FASTA files staged at:
             ${FTP}/releases/species/${SPECIES}.genomic.fa.gz
             ${FTP}/releases/species/${SPECIES}.protein.fa.gz
             Gene and EST sequences derived from AceDB
 Output : BLAST databases in ${WORMBASE}/databases/${RELEASE}/blast/${SPECIES}.

Stage a new release

Update Steps

Compile Gene Resources

Create precompiled gene page files specifically to populate the Phenotype tables.

  Usage : ./steps/compile_gene_resource.pl ${RELEASE}
  Input : AceDB data
 Output : Files ${WORMBASE}/databases/${RELEASE}/gene
  • gene_rnai_pheno.txt
  • gene_xgene_pheno.txt
  • phenotype_id2name.txt
  • rnai_data.txt
  • variation_data.txt

Mirror ontology from Sanger

Mirror OBO files from Sanger. These are necessary for the ontology searches.

  Usage : ./steps/mirror_ontology_files.pl ${RELEASE}
  Input : none
 Output : Files mirrored to ${WORMBASE}/databases/${RELEASE}/ontology

Compile Ontology Resources

Take the mirrored files and compile them into the databases for the ontology searches.

  Usage : ./steps/compile_ontology_resources.pl ${RELEASE}
  Input : OBO files mirrored earlier in ${WORMBASE}/databases/${RELEASE}/ontology; 
          compiled data files from Compile Gene Resources step
 Output : to ${WORMBASE}/database/${RELEASE}/ontology:


  • anatomy_association.RELEASE.wb
  • gene_association.RELEASE.wb.ce
  • gene_ontology.RELEASE.obo
  • name2id.txt
  • search_data.txt
  • anatomy_ontology.RELEASE.obo
  • gene_association.RELEASE.wb.cjp
  • id2association_counts.txt
  • parent2ids.txt
  • gene_association.RELEASE.wb
  • gene_association.RELEASE.wb.ppa
  • id2name.txt
  • phenotype_association.RELEASE.wb
  • gene_association.RELEASE.wb.cb
  • gene_association.RELEASE.wb.rem
  • id2parents.txt
  • phenotype_ontology.RELEASE.obo

Compile Orthology Resources

Create precompiled orthology and disease display and search related files

  Usage : ./steps/compile_gene_data.pl ${RELEASE}
          ./steps/compile_ortholog_data.pl ${RELEASE}
          ./steps/compile_orthology_resources.pl ${RELEASE}
  Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files
 Output : Files ${WORMBASE}/databases/${RELEASE}/orthology
  • all_proteins.txt
  • disease_page_data.txt
  • disease_search_data.txt
  • full_disease_data.txt
  • gene_association.$RELEASE.wb.ce
  • gene_id2go_bp.txt
  • gene_id2go_mf.txt
  • gene_id2omim_ids.txt
  • gene_id2phenotype.txt
  • gene_list.txt
  • go_id2omim_ids.txt
  • go_ids2descendants.txt
  • hs_ensembl_id2omim.txt
  • hs_proteins.txt
  • id2name.txt
  • last_processed_gene.txt
  • name2id.txt
  • omim2disease.txt
  • omim_id2all_ortholog_data.txt
  • omim_id2disease_desc.txt
  • omim_id2disease_name.txt
  • omim_id2disease_notes.txt
  • omim_id2disease_synonyms.txt
  • omim_id2disease.txt
  • omim_id2gene_name.txt
  • omim_id2go_ids.txt
  • omim_id2phenotypes.txt
  • omim_reconfigured.txt
  • ortholog_other_data_hs_only.txt
  • ortholog_other_data.txt

Compile Interaction Data

Create precompiled gene page files specifically to populate interaction listing pages.

  Usage : ./steps/compile_interaction_data.pl ${RELEASE}
  Input : AceDB interaction data
 Output : Files ${WORMBASE}/databases/${RELEASE}/interaction
  • compiled_interaction_data.txt

Create BLAT databases for available species

Build BLAT databases of genomic sequence for each available species.

  Usage : ./steps/create_blat_databases.pl ${RELEASE}
  Input : Genomic sequence FASTA files mirrored from Sanger to
             ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz
 Output : BLAT .nib files in ${WORMBASE}/databases/${RELEASE}/blat

Create ePCR databases for available species

Build ePCR databases for each species.

  Usage : ./steps/create_epcr_databases.pl ${RELEASE}
  Input : Mirrored genomic sequence files from Sanger to
             ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz
 Output : ePCR databases to ${WORMBASE}/databases/${RELEASE}/epcr

Load genomic GFF DBs for available species

Get genomic gff files from Sanger and load into the DBs

 Usage :./steps/mirror_genomic_gffdb.pl ${RELEASE}
        ./steps/process_genomic_gff_files.pl ${RELEASE}
        ./steps/load_genomic_gffdb.pl ${RELEASE}
 Input : GFF and FASTA files mirrored from Sanger to
           GFF : ${FTP}/${SPECIES}/genome_feature_tables/GFF2/${SPECIES}.${VERSION}.gff.gz
           DNA : ${FTP}/${SPECIES}/sequences/dna/${SPECIES}.${VERSION}.dna.fa.gz 
 Output: (This script both creates/mirrors and uses the files above).

Trouble shooting notes:

  • File and directory names need to be consistent with format specified in Update.pm circa line 36
  • If necessary, e.g. files were incorrectly named, they should be manually downloaded from the source site, uncompressed and renamed correctly
  • Progress can be monitored by checking the log file ..admin/development/logs/{WSRELEASE}/load genomic feature gff databases.log and the building of the mysql db files in /usr/local/mysql/data
  • The C. elegans build is particularly complex. One block is apparently the permissions for /u/l/mysql/data/c_elegans_{RELEASE} directory, set to 775.
  • Granting permission to web requests: mysql -u root -p -e 'grant select on *.* to "www-data"@localhost'
  • To restart mysql db: sudo /etc/init.d/mysql restart; sudo /etc/init.d/httpd graceful

Build and Load GFF patches

Create and load number of patches for the c_elegans GFF database, including protein motifs and genetic limits.

 Usage : ./steps/load_gff_patches.pl ${RELEASE}
 Input : Files created to ${FTP}/genomes/c_elegans/genome_feature_tables/GFF2
Output : Files created above.

Convert GFF2 into GFF3

Notes...

 Usage: ./steps/convert_gff2_to_gff3.pl ${RELEASE}

Create a GBrowse-driven genetic map

Notes...

 Usage: ./steps/load_gmap_gffdb.pl ${RELEASE}

Create a GBrowse-driven physical map

Notes...

 Usage: ./steps/load_pmap_gffdb.pl {WSRELEASE}

Create dump files of common datasets

Notes...


Load the CLUSTALW database

Notes...

 Usage: ./steps/load_clustal_db.pl {WSRELEASE}

Mirror annotation files from Sanger to the FTP site

Notes...

 Usage: ./steps/mirror_annotations.pl {WSRELEASE}

Compiled File Table

Update Records

Update Matrix WS205

Update Matrix WS206

Update Matrix WS207

Update Matrix WS208

Update Matrix WS209

Update Matrix WS210

Update Matrix WS211

Update Matrix WS212

Update Matrix WS213

Update Matrix WS214

Update Matrix WS215

Update Matrix WS216

Update Matrix WS217

Update Matrix WS220