Difference between revisions of "Software Life Cycle: 1. Updating The Development Server"

From WormBaseWiki
Jump to navigationJump to search
Line 411: Line 411:
 
<tr>
 
<tr>
 
<td>Compile Gene Resources </td>
 
<td>Compile Gene Resources </td>
<td>Unpack ACeDB</td>
+
<td>gene_rnai_pheno.txt </td>
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>W::U::Staging::UnpackAcedb</td>
 
<td>W::U::Staging::UnpackAcedb</td>
Line 419: Line 419:
 
<tr>
 
<tr>
 
<td>Compile Gene Resources </td>
 
<td>Compile Gene Resources </td>
<td>Unpack ACeDB</td>
+
<td>gene_xgene_pheno.txt</td>
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>W::U::Staging::UnpackAcedb</td>
 
<td>W::U::Staging::UnpackAcedb</td>
Line 426: Line 426:
 
<tr>
 
<tr>
 
<td>Compile Gene Resources </td>
 
<td>Compile Gene Resources </td>
<td>Unpack ACeDB</td>
+
<td>phenotype_id2name.txt</td>
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>W::U::Staging::UnpackAcedb</td>
 
<td>W::U::Staging::UnpackAcedb</td>
Line 433: Line 433:
 
<tr>
 
<tr>
 
<td>Compile Gene Resources </td>
 
<td>Compile Gene Resources </td>
<td>Unpack ACeDB</td>
+
<td>rnai_data.txt</td>
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>W::U::Staging::UnpackAcedb</td>
 
<td>W::U::Staging::UnpackAcedb</td>
Line 440: Line 440:
 
<tr>
 
<tr>
 
<td>Compile Gene Resources </td>
 
<td>Compile Gene Resources </td>
<td>Unpack ACeDB</td>
+
<td>variation_data.txt </td>
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>steps/unpack_acedb.pl (manual)<br/ >
 
<td>W::U::Staging::UnpackAcedb</td>
 
<td>W::U::Staging::UnpackAcedb</td>

Revision as of 17:24, 8 June 2011

Overview

This document describes the process of staging a new release of WormBase on the development server.

The automated staging pipeline consists of:

  • a harness that handles logging, error trapping, and basic shared functions
  • a suite of modules -- one per step -- that implement the step or make calls to helper scripts
  • helper scripts in Perl or shell that assist in implementation

Control of the pipeline: You can use the pipeline in several ways:

  • Launch the full pipeline via the control script, the preferred and automated method.
  • Run individual steps in the context of the pipeline using control scripts in steps/, useful if the pipeline fails at a specific point.
  • Directly run helper scripts outside of the logging facilities of the pipeline, useful if you need to rebuild something quickly.

Document Conventions

The current development server is

 wb-dev: wb-dev.oicr.on.ca (FQDN); aka: dev.wormbase.org

When indicated, substitute WSXXX for ${RELEASE}.

System paths referred to in this document:

      FTP : /usr/local/ftp/pub/wormbase
 WORMBASE : /usr/local/wormbase
    ACEDB : /usr/local/wormbase/acedb

Staging Pipeline Code

The update pipeline code is available in the website-admin module on github:

tharris> git clone git@github.com:WormBase/website-admin.git
tharris> cd website-admin/update

 lib/                  -- the shared library suite that handles updates.
 staging/          -- code related to staging data on the development site. 
 production/     -- code related to the releasing data/code into production.

The contents are:

  logs/            -- the logs directory for each step/update
  bin/              -- Perl scripts for manually launching individual steps.
  README.txt -- directory listing
  updatelog.conf  -- a configuration file for the update process
  update.sh  -- master script that fires off each step of the pipeline
  util/       -- various helper scripts for the update process

Running the Update Pipeline

Log Files

The Staging Pipeline creates informative logs for each step of the process. Logs are located at:

 /usr/local/wormbase/logs/staging/WSXXX
     master.log  -- Master log tracks all steps; useful for a meta-view of the pipeline. Contains INFO, WARN, ERROR, and FATAL messages.
     master.err  -- Master error log tracks ERROR and FATAL messages encountered across all steps.

Each individual step creates its own log file capturing STDERR and STDOUT containing informative messages from the pipeline. These are useful for tracking progress and triaging problems. For example:

 /usr/local/wormbase/logs/staging/WSXXX/build_blast_databases/
      step.log   -- step-specific log tracking everything from TRACE on up.
      step.err    -- step-specific error log tracking ERROR and FATAL messages. Good place to check if a step breaks.

Executing the Pipeline

A single script fires off all steps of the process. You should run it inside a screen.

 tharris> screen
 tharris> ./stage_via_pipeline.pl WSXXX
   (to disconnect your screen)
 tharris> ^a ^d
   (to resume your screen)
 tharris> screen -r   
 

Monitor progress of the update by following the master log file:

tharris> tail -f /usr/local/wormbase/logs/staging/WSXXX/master.log
 screen command reference

Update Steps

The steps that comprise the pipeline, the script to launch them, and the module that implements are listed below.

stepcontrol scriptmodule
Mirror a new release steps/mirror_new_release.pl (manual)
W::U::Staging::MirrorNewRelease
Unpack ACeDB steps/unpack_acedb.pl (manual)
W::U::Staging::UnpackAcedb
Create BLAST databases steps/create_blast_databases.pl
W::U::Staging::CreateBlastDatabases
Create BLAT databases steps/create_blat_databases.pl
W::U::Staging::CreateBlatDatabases
Load Genomic GFF databases steps/load_genomic_gff_databases.pl
W::U::Staging::LoadGenomicGFFDatabases
Unpack and Load the ClustalW database steps/unpack_clustalw_database.pl
W::U::Staging::UnpackClustalWDatabase
Compile Gene Summary resources steps/compile_gene_resources.pl
W::U::Staging::CompileGeneResources
Compile Ontology resources steps/compile_ontology_resources.pl
W::U::Staging::CompileOntologyResources
Create commonly requested datasets steps/dump_annotations.pl
W::U::Staging::DumpAnnotations
Go Live steps/go_live.pl
W::U::Staging::GoLive
Precache content steps/precache_content.pl
W::U::Staging::PrecacheContent


  • Compile orthology resources
  • Compile interaction resources
  • Build and load GFF patches
  • Convert GFF2 into GFF3
  • Create a GBrowse-driven genetic map
  • Create a GBrowse-drive physical map
  • Update strains database


Mirror a new release

New releases are mirrored directly from the Hinxton FTP site to the primary WormBase FTP site hosted on wb-dev:/usr/local/ftp. This process is run via cron but can also be run manually.

 # Mirror the next incremental release newer than what we already have:
 # Cron: 
 ./steps/mirror_new_release.pl

 # Or mirror a specific release: 
 ./steps/mirror_new_release.pl --release WS150   // Mirror the WS150 release to /usr/local/ftp/pub/wormbase/releases/WS150

Unpack Acedb

Unpack AceDB from the new release. Customize the new installation with skeletal files located at /usr/local/wormbase/website/classic/wspec. You will need approximately 25 GB of disk space per release.

via pipeline: ./steps/unpack_acedb.pl ${RELEASE}
via helper : helpers/unpack_acedb.sh ${RELEASE}
  Input : Files staged at ${FTP}/releases/${RELEASE}/species
 Output : Unpacked AceDB files in ${ACEDB}/wormbase_${RELEASE} 

When complete, you should have a new acedb directory containing:

   -- database
   -- wgf
   -- wquery
   -- wspec

Test the database by:

> ps -ax | grep acedb ## to get acedb process number
> kill -9 {AceDB proc number} ## stop current acedb process
> sudo /etc/init.d/xinetd restart 
> saceclient localhost -port 2005

Create BLAST databases

Build nucleotide and protein BLAST databases for species with genomic sequence and conceptual translations. In addition, for C. elegans and C. briggsae, we build blast databases for ESTs and "genes" (actually clones).

  Usage : ./steps/create_blast_databases.pl ${RELEASE}
  Input : Genomic sequence and protein FASTA files staged at:
             ${FTP}/releases/species/${SPECIES}.${RELEASE}.genomic.fa.gz
             ${FTP}/releases/species/${SPECIES}.${RELEASE}.protein.fa.gz
             Gene and EST sequences derived from AceDB
 Output : BLAST databases in ${WORMBASE}/databases/${RELEASE}/blast/${SPECIES}.

Create BLAT databases

Build BLAT databases of genomic sequence.

  Usage : ./steps/create_blat_databases.pl ${RELEASE}
  Input : Genomic sequence FASTA files staged at
             ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz
 Output : BLAT .nib files in ${WORMBASE}/databases/${RELEASE}/blat/${SPECIES}

Load genomic GFF annotations

Convert GFF files into Bio::DB::GFF (GFF2) or Bio::DB::SeqFeature::Store (GFF3) databases.

 Usage : ./steps/load_genomic_gff_databases.pl ${RELEASE}
 Input : GFF and FASTA files staged at:
           GFF : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.gff[2|3].gz
           DNA : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz

Unpack and Load the CLUSTALW database

Usage: ./steps/load_clustal_db.pl {WSRELEASE}
Input: ${FTP}/releases/${RELEASE}/COMPARATIVE_ANALYSIS/wormpep${RELEASE}.clw.sql.bz2
Output: a new mysql database called clustal_${RELEASE}

Compile Gene Resources

Create precompiled gene page files specifically to populate the Phenotype tables.

  Usage : ./steps/compile_gene_resource.pl ${RELEASE}
  Input : AceDB data
 Output : Files ${WORMBASE}/databases/${RELEASE}/gene
  • gene_rnai_pheno.txt (gene/gene)
  • gene_xgene_pheno.txt (gene/gene)
  • phenotype_id2name.txt (gene/gene)
  • rnai_data.txt (gene/gene)
  • variation_data.txt (gene/gene)

Compile Ontology Resources

TODO: This step relies on a number of external helper scripts that should ALL be folded into CompileGeneResources. They are located at

staging/helpers/gene_summary 

Take the mirrored ontology files and compile them into the databases for the ontology searches.

  Usage : ./steps/compile_ontology_resources.pl ${RELEASE}
  Input : OBO files staged at: /usr/local/ftp/pub/wormbase/releases/WSXXX/ONTOLOGY
          compiled data files from Compile Gene Resources step
  • anatomy_association.RELEASE.wb
  • anatomy_ontology.RELEASE.obo
  • gene_association.RELEASE.wb
  • gene_association.RELEASE.wb.cb
  • gene_association.RELEASE.wb.ce
  • gene_association.RELEASE.wb.cjp
  • gene_association.RELEASE.wb.ppa
  • gene_association.RELEASE.wb.rem
  • gene_ontology.RELEASE.obo
  • phenotype_association.RELEASE.wb
  • phenotype_ontology.RELEASE.obo
 Output : to ${WORMBASE}/database/${RELEASE}/ontology:
  • id2association_counts.txt (ontology/tree_lister)
  • id2name.txt (ontology/tree_lister)
  • id2parents.txt (ontology/tree_lister)
  • id2total_associations.txt (ontology/tree_lister)
  • name2id.txt
  • search_data.txt
  • parent2ids.txt (ontology/tree_lister)


Compile Orthology Resources

Create precompiled orthology and disease display and search related files

  Usage : ./steps/compile_gene_data.pl ${RELEASE}
          ./steps/compile_ortholog_data.pl ${RELEASE}
          ./steps/compile_orthology_resources.pl ${RELEASE}
  Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files
  • gene_association.$RELEASE.wb.ce
  • gene_id2go_bp.txt
  • gene_id2go_mf.txt
  • gene_id2phenotype.txt
  • gene_list.txt
  • last_processed_gene.txt
  • ortholog_other_data.txt
  Intermediate: 
  • all_proteins.txt
  • disease_page_data.txt
  • full_disease_data.txt
  • hs_proteins.txt
  • omim2disease.txt
  • omim_id2go_ids.txt
  • omim_id2phenotypes.txt
  • omim_id2disease_synonyms.txt
  • omim_reconfigured.txt
  • ortholog_other_data_hs_only.txt
 Output : Files ${WORMBASE}/databases/${RELEASE}/orthology (summary page using files in parenthesis)
  • disease_search_data.txt (orthology/search)
  • gene_id2omim_ids.txt (orthology/disease)
  • go_id2omim_ids.txt (orthology/disease,ontology/gene)
  • go_ids2descendants.txt (orthology/gene)
  • hs_ensembl_id2omim.txt (orthology/gene)
  • id2name.txt (orthology/disease, orthology/gene)
  • name2id.txt (orthology/disease)
  • omim_id2all_ortholog_data.txt (orthology/disease)
  • omim_id2disease_desc.txt (orthology/disease)
  • omim_id2disease_name.txt (orthology/disease,ontology/gene)
  • omim_id2disease_notes.txt (orthology/disease)
  • omim_id2disease.txt (orthology/gene)
  • omim_id2gene_name.txt (orthology/search)

Create files of commonly requested datasets

 Usage: ./steps/dump_annotations.pl {WSRELEASE}
Output: datasets in ${FTP}/releases/${RELEASE}/annotations and species/annotations

The staging harness will automatically run scripts in annotation_dumpers/*. These scripts should abide by the following conventions:

   1. Be located in update/staging/annotation_dumpers                                                                                                      
   2. Be named either                                                                                                                                      
          dump_species_*   for species level data (like brief IDs)                                                                                          
          dump_resource_*  for resource level data (like laboratories)                                                                                      
   3. Follow existing examples, including available parameters.                                                                                            
   4. Dump to STDERR and STDOUT.                                                                                                                           
   Notes:                                                                                                                                                  
                                                                                                                                                            
   1. dump_species_* will be called for each species managed by WormBase                                                                                   
      and will end up in                                                                                                                                   
         ${FTP_ROOT}/releases/[RELEASE]/species/[G_SPECIES]/annotation/[G_SPECIES].[RELEASE].[DESCRIPTION].txt                                            
      dump_resource_* will be called once and end up in                                                                                                    
         ${FTP_ROOT}/datasets-wormbase/wormbase.[RELEASE].[DESCRIPTION].txt                                                                   
   2. The filename will be created by stripping off dump_species_ or dump_resource_.                                                                       
       Species specific resources will be prepended with the appropriate species.


Compile Interaction Data

DEPRECATED. NO NEED TO MIGRATE THIS INTO THE NEW STAGING PIPELINE.

Create precompiled gene page files specifically to populate interaction listing pages.

  Usage : ./steps/compile_interaction_data.pl ${RELEASE}
  Input : AceDB interaction data
 Output : Files ${WORMBASE}/databases/${RELEASE}/interaction
  • compiled_interaction_data.txt

Convert GFF2 into GFF3

Notes...

 Usage: ./steps/convert_gff2_to_gff3.pl ${RELEASE}

Create a GBrowse-driven genetic map

Notes...

 Usage: ./steps/load_gmap_gffdb.pl ${RELEASE}

Create a GBrowse-driven physical map

Notes...

 Usage: ./steps/load_pmap_gffdb.pl {WSRELEASE}

Go Live

steps/go_live.pl WSXXX

This script will

  • create a series of symlinks in the FTP site (for example, to maintain the virtually organized species/ directory)
  • create "current" symlinks in the FTP site for easy access.
  • adjust symlinks to mysql GFF databases updated this release.
  • adjust the symlink at /usr/local/wormbase/acedb/wormbase -> to point to the new wormbase_WSXXX unpacked acedb.

If you omit the WSXXX on the command line, the script will simply organize the virtual directories on the ftp site up to and including the current release. MySQL and AceDB symlinks will not be created.

Compiled file documentation

Step File Description WB2 update
Compile Gene Resources gene_rnai_pheno.txt steps/unpack_acedb.pl (manual)
W::U::Staging::UnpackAcedb
Compile Gene Resources gene_xgene_pheno.txt steps/unpack_acedb.pl (manual)
W::U::Staging::UnpackAcedb
Compile Gene Resources phenotype_id2name.txt steps/unpack_acedb.pl (manual)
W::U::Staging::UnpackAcedb
Compile Gene Resources rnai_data.txt steps/unpack_acedb.pl (manual)
W::U::Staging::UnpackAcedb
Compile Gene Resources variation_data.txt steps/unpack_acedb.pl (manual)
W::U::Staging::UnpackAcedb
Step File Description
WB2 update

Update Records

Update Matrix WS205

Update Matrix WS206

Update Matrix WS207

Update Matrix WS208

Update Matrix WS209

Update Matrix WS210

Update Matrix WS211

Update Matrix WS212

Update Matrix WS213

Update Matrix WS214

Update Matrix WS215

Update Matrix WS216

Update Matrix WS217

Update Matrix WS220