Software Life Cycle: 1. Updating The Development Server

This is a quick description of how to update the development server with a new release of the database. In general, the development process involves mirroring a large number of files from Sanger, unpacking, and in many cases, massaging into a format suitable for driving the website.

Development Server

The current development server is

 wb-dev: wb-dev.oicr.on.ca / dev.wormbase.org

When indicated, substitute WSXXX for ${RELEASE}.

System paths referred to in this document:

      FTP : ~ftp/pub/wormbase (on brie3 this is /usr/local/)
 WORMBASE : /usr/local/wormbase
    ACEDB : /usr/local/wormbase/acedb

NB:

The updater's umask setting may be important given the movements of files across the system. Best to set it @ 002.

A sense of the status of ea. step can be read by running line command > ps ax | grep perl

Update Pipeline Code

The update pipeline code is available in the WormBase admin module:

> hg (complete mercurial line command ...)

Change into the development directory:

> cd admin/update/development

The contents are:

  logs/      -- the logs directory for each step/update
  README.txt -- nothing 
  steps/     -- Perl scripts that launch each step
  Update.pm  -- the top level module for the update process
  Update/    -- Perl modules corresponding to each step
  updatelog.conf  -- a configuration file for the update process
  update.sh  -- master script that fires off each step of the pipeline
  util/       -- various helper scripts for the update process

A single shell script fires off all steps of the process.

> ./update.sh

However, is best to run each step individually; periodically checking the log to monitor progress.

> ./steps/<individual_update_script>.pl

The steps to perform are:

Purge disk space
Create necessary directories
Mirror and unpack ACeDB from Sanger
Compile Gene Resources
Mirror ontology files from Sanger
Compile ontology resources for the site
Compile orthology resources
Compile interaction resources
Create BLAST databases for available species
Create BLAT database for available species
Create ePCR databases for select species
Load genomic GFF databases for available species
Build and load GFF patches
Convert GFF2 into GFF3
Create a GBrowse-driven genetic map
Create a GBrowse-drive physical map
Update strains database
Create dump files of common datasets
Mirror annotation files from Sanger to the FTP site
Load CLUSTAL db

Each step is described below.

Check logs and resolve issues

Log files are under admin/development/logs and the logging levels are set to record ERRORS and WARNINGS

NB: While logs provide excellent records of the loading process. It is important to check proper functioning of the web pages that use the data; links to which are provided with the description of each step.

Update Steps

Purge Disk Space

Remove obsolete files from the (staging) FTP site on the development site. These have already been mirrored to the production FTP site and do not need to be maintained on the development server.

 Usage  : ./steps/purge_disk_space.sh

Create necessary directories

Create staging directories for the update process.

  Usage : ./steps/create_directories.pl ${RELEASE}
 Output : Directories in ${WORMBASE}/databases (primarily)

Mirror and unpack ACeDB from Sanger

Mirror and unpack the new release of the database from Sanger. Add in appropriate control files for the new acedb database: serverpasswrd.wrm, passwrd.wrm, serverconfig.wrm pulled from the checked out development source (/usr/local/wormbase/wspec).

Files will be mirrored and unpacked to /usr/local. Please make sure that there is sufficient space in this directory! You will most likely need approximately 25 GB of disk space. Possible places to free up disk space:

 /usr/local/mysq/data
 /usr/local/wormbase/acedb/tmp
 ~{you}/mp3s

  Usage : ./steps/mirror_acedb.pl ${RELEASE}
          ./steps/unpack_acedb.pl ${RELEASE}
  Input : Files mirrored from Sanger to ${ACEDB}/tmp
 Output : Unpacked AceDB files in ${ACEDB}/wormbase_${RELEASE}

Note: This can take a *long* time. You might to run this in a screen:

 > screen
 > ./steps/mirror_acedb.pl WSXXX
   (to disconnect your screen)
 > ^a ^d
   (to resume your screen)
 > screen -r   
 
 screen command reference

When complete:

 > a new acedb directory should have been completed: wormbase_WS{RELEASE} and should contain subdirs:
   -- database
   -- wgf
   -- wquery
   -- wspec
 > check to make sure that the following directory and symlink exist: ${ACEDB}/wormbase -> wormbase_${RELEASE}
 > correct configuration file in wspec: serverconfig.wrm, server.wrm, serverpasswd.wrm

It is also good to check for a functional db -- try to connect to the acedb via a test script that creates a db handle. It may be necessary to restart the database:

> ps -ax | grep acedb ## to get acedb process number
> kill -9 {AceDB proc number} ## stop current acedb process
> sudo /etc/init.d/xinetd restart 
> ## run command for test script ##

Trouble shooting notes:

gzip version incompatibilies may break the step.

Compile Gene Resources

Create precompiled gene page files specifically to populate the Phenotype tables.

  Usage : ./steps/compile_gene_resource.pl ${RELEASE}
  Input : AceDB data
 Output : Files ${WORMBASE}/databases/${RELEASE}/gene

*gene_rnai_pheno.txt
*gene_xgene_pheno.txt
*phenotype_id2name.txt
*rnai_data.txt
*variation_data.txt

Mirror ontology from Sanger

Mirror OBO files from Sanger. These are necessary for the ontology searches.

  Usage : ./steps/mirror_ontology_files.pl ${RELEASE}
  Input : none
 Output : Files mirrored to ${WORMBASE}/databases/${RELEASE}/ontology

Compile Ontology Resources

Take the mirrored files and compile them into the databases for the ontology searches.

  Usage : ./steps/compile_ontology_resources.pl ${RELEASE}
  Input : OBO files mirrored earlier in ${WORMBASE}/databases/${RELEASE}/ontology; 
          compiled data files from Compile Gene Resources step
 Output : to ${WORMBASE}/database/${RELEASE}/ontology:

anatomy_association.RELEASE.wb
gene_association.RELEASE.wb.ce
gene_ontology.RELEASE.obo
name2id.txt
search_data.txt
anatomy_ontology.RELEASE.obo
gene_association.RELEASE.wb.cjp
id2association_counts.txt
parent2ids.txt
gene_association.RELEASE.wb
gene_association.RELEASE.wb.ppa
id2name.txt
phenotype_association.RELEASE.wb
gene_association.RELEASE.wb.cb
gene_association.RELEASE.wb.rem
id2parents.txt
phenotype_ontology.RELEASE.obo

Compile Orthology Resources

Create precompiled orthology and disease display and search related files

  Usage : ./steps/compile_gene_data.pl ${RELEASE}
          ./steps/compile_ortholog_data.pl ${RELEASE}
          ./steps/compile_orthology_resources.pl ${RELEASE}
  Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files
 Output : Files ${WORMBASE}/databases/${RELEASE}/orthology

all_proteins.txt
disease_page_data.txt
disease_search_data.txt
full_disease_data.txt
gene_association.$RELEASE.wb.ce
gene_id2go_bp.txt
gene_id2go_mf.txt
gene_id2omim_ids.txt
gene_id2phenotype.txt
gene_list.txt
go_id2omim_ids.txt
go_ids2descendants.txt
hs_ensembl_id2omim.txt
hs_proteins.txt
id2name.txt
last_processed_gene.txt
name2id.txt
omim2disease.txt
omim_id2all_ortholog_data.txt
omim_id2disease_desc.txt
omim_id2disease_name.txt
omim_id2disease_notes.txt
omim_id2disease_synonyms.txt
omim_id2disease.txt
omim_id2gene_name.txt
omim_id2go_ids.txt
omim_id2phenotypes.txt
omim_reconfigured.txt
ortholog_other_data_hs_only.txt
ortholog_other_data.txt

Compile Interaction Data

Create precompiled gene page files specifically to populate interaction listing pages.

  Usage : ./steps/compile_interaction_data.pl ${RELEASE}
  Input : AceDB interaction data
 Output : Files ${WORMBASE}/databases/${RELEASE}/interaction

*compiled_interaction_data.txt

Create BLAST databases for available species

Build BLAST databases for available species. For some species, this includes databases for genes, ests, proteins, and genomic sequence. For others, only genomic sequence and protein databases are constructed.

  Usage : ./steps/create_blast_databases.pl ${RELEASE}
  Input : Genomic sequence and protein FASTA files mirrored from Sanger to
             ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz;
          Gene and EST sequences derived from AceDB
 Output : BLAST databases in ${WORMBASE}/databases/${RELEASE}/blast

NB: Ensure that the database is entered in /u/l/w/website-classic/html/blast_blat/search_form.html and /u/l/w/website-classic/cgi_perl/searches/blast_blat

Create BLAT databases for available species

Build BLAT databases of genomic sequence for each available species.

  Usage : ./steps/create_blat_databases.pl ${RELEASE}
  Input : Genomic sequence FASTA files mirrored from Sanger to
             ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz
 Output : BLAT .nib files in ${WORMBASE}/databases/${RELEASE}/blat

Create ePCR databases for available species

Build ePCR databases for each species.

  Usage : ./steps/create_epcr_databases.pl ${RELEASE}
  Input : Mirrored genomic sequence files from Sanger to
             ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz
 Output : ePCR databases to ${WORMBASE}/databases/${RELEASE}/epcr

Load genomic GFF DBs for available species

Get genomic gff files from Sanger and load into the DBs

 Usage :./steps/mirror_genomic_gffdb.pl ${RELEASE}
        ./steps/process_genomic_gff_files.pl ${RELEASE}
        ./steps/load_genomic_gffdb.pl ${RELEASE}
 Input : GFF and FASTA files mirrored from Sanger to
           GFF : ${FTP}/${SPECIES}/genome_feature_tables/GFF2/${SPECIES}.${VERSION}.gff.gz
           DNA : ${FTP}/${SPECIES}/sequences/dna/${SPECIES}.${VERSION}.dna.fa.gz 
 Output: (This script both creates/mirrors and uses the files above).

Trouble shooting notes:

File and directory names need to be consistent with format specified in Update.pm circa line 36
If necessary, e.g. files were incorrectly named, they should be manually downloaded from the source site, uncompressed and renamed correctly
Progress can be monitored by checking the log file ..admin/development/logs/{WSRELEASE}/load genomic feature gff databases.log and the building of the mysql db files in /usr/local/mysql/data
The C. elegans build is particularly complex. One block is apparently the permissions for /u/l/mysql/data/c_elegans_{RELEASE} directory, set to 775.
Granting permission to web requests: mysql -u root -p -e 'grant select on *.* to "www-data"@localhost'
To restart mysql db: sudo /etc/init.d/mysql restart; sudo /etc/init.d/httpd graceful

Build and Load GFF patches

Create and load number of patches for the c_elegans GFF database, including protein motifs and genetic limits.

 Usage : ./steps/load_gff_patches.pl ${RELEASE}
 Input : Files created to ${FTP}/genomes/c_elegans/genome_feature_tables/GFF2
Output : Files created above.

Convert GFF2 into GFF3

Notes...

 Usage: ./steps/convert_gff2_to_gff3.pl ${RELEASE}

Create a GBrowse-driven genetic map

Notes...

 Usage: ./steps/load_gmap_gffdb.pl ${RELEASE}

Create a GBrowse-driven physical map

Notes...

 Usage: ./steps/load_pmap_gffdb.pl {WSRELEASE}

Create dump files of common datasets

Notes...

Load the CLUSTALW database

Notes...

 Usage: ./steps/load_clustal_db.pl {WSRELEASE}

Mirror annotation files from Sanger to the FTP site

Notes...

 Usage: ./steps/mirror_annotations.pl {WSRELEASE}

Compiled File Table

Update Records

Software Life Cycle: 1. Updating The Development Server

Contents

Development Server

Update Pipeline Code

Check logs and resolve issues

Update Steps

Purge Disk Space

Create necessary directories

Mirror and unpack ACeDB from Sanger

Compile Gene Resources

Mirror ontology from Sanger

Compile Ontology Resources

Compile Orthology Resources

Compile Interaction Data

Create BLAST databases for available species

Create BLAT databases for available species

Create ePCR databases for available species

Load genomic GFF DBs for available species

Build and Load GFF patches

Convert GFF2 into GFF3

Create a GBrowse-driven genetic map

Create a GBrowse-driven physical map

Create dump files of common datasets

Load the CLUSTALW database

Mirror annotation files from Sanger to the FTP site

Compiled File Table

Update Records

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools