Difference between revisions of "Software Life Cycle: 1. Updating The Development Server"
Line 61: | Line 61: | ||
Monitor the progress as before. | Monitor the progress as before. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
However, is best to run each step individually; periodically checking the log to monitor progress. | However, is best to run each step individually; periodically checking the log to monitor progress. |
Revision as of 21:50, 17 May 2011
Contents
- 1 Overview
- 2 Document Conventions
- 3 Staging Pipeline Code
- 4 Running the Update Pipeline
- 4.1 Dry Run
- 4.2 Live Run
- 4.3 Check logs and resolve issues
- 4.4 Update Steps
- 4.5 Update Steps
- 4.5.1 Create necessary directories
- 4.5.2 Mirror and unpack ACeDB from Sanger
- 4.5.3 Compile Gene Resources
- 4.5.4 Mirror ontology from Sanger
- 4.5.5 Compile Ontology Resources
- 4.5.6 Compile Orthology Resources
- 4.5.7 Compile Interaction Data
- 4.5.8 Create BLAST databases for available species
- 4.5.9 Create BLAT databases for available species
- 4.5.10 Create ePCR databases for available species
- 4.5.11 Load genomic GFF DBs for available species
- 4.5.12 Build and Load GFF patches
- 4.5.13 Convert GFF2 into GFF3
- 4.5.14 Create a GBrowse-driven genetic map
- 4.5.15 Create a GBrowse-driven physical map
- 4.5.16 Create dump files of common datasets
- 4.5.17 Load the CLUSTALW database
- 4.5.18 Mirror annotation files from Sanger to the FTP site
- 4.6 Compiled File Table
- 4.7 Update Records
Overview
This document describes the process of preparing and staging a new release of WormBase on the development server.
This process is entirely automated; each step is described in more detail below to assist in troubleshooting.
Document Conventions
The current development server is
wb-dev: wb-dev.oicr.on.ca (FQDN); aka: dev.wormbase.org
When indicated, substitute WSXXX for ${RELEASE}.
System paths referred to in this document:
FTP : /usr/local/ftp/pub/wormbase WORMBASE : /usr/local/wormbase ACEDB : /usr/local/wormbase/acedb
Staging Pipeline Code
The update pipeline code is available in the website-admin module on github:
tharris> git clone git@github.com:WormBase/website-admin.git tharris> cd website-admin/update lib/ -- the shared library suite that handles updates. development/ -- code related to staging data on the development site. production/ -- code related to the releasing data/code into production.
The contents are:
logs/ -- the logs directory for each step/update bin/ -- Perl scripts for manually launching individual steps. README.txt -- directory listing updatelog.conf -- a configuration file for the update process update.sh -- master script that fires off each step of the pipeline util/ -- various helper scripts for the update process
Running the Update Pipeline
Dry Run
A single shell script fires off all steps of the process. You should run it inside a screen.
tharris> screen
You can first run it as a dryrun. This will test for the presence of all input files without running any lengthy builds.
tharris> ./bin/staging.pl dryrun
Monitor progress of the update by following the master log file:
tharris> tail -f /usr/local/wormbase/logs/staging_updates/WSXXX/master.log
Live Run
tharris> screen tharris> ./bin/stage.pl
Monitor the progress as before.
However, is best to run each step individually; periodically checking the log to monitor progress.
> ./steps/<individual_update_script>.pl
The steps to perform are:
- Create necessary directories
- Mirror and unpack ACeDB from Sanger (bin/mirror_new_release.pl; run under cron)
- Compile Gene Resources
- Mirror ontology files from Sanger
- Compile ontology resources for the site
- Compile orthology resources
- Compile interaction resources
- Create BLAST databases for available species
- Create BLAT database for available species
- Create ePCR databases for select species
- Load genomic GFF databases for available species
- Build and load GFF patches
- Convert GFF2 into GFF3
- Create a GBrowse-driven genetic map
- Create a GBrowse-drive physical map
- Update strains database
- Create dump files of common datasets
- Mirror annotation files from Sanger to the FTP site
- Load CLUSTAL db
Each step is described below.
Check logs and resolve issues
Log files are under admin/development/logs and the logging levels are set to record ERRORS and WARNINGS
NB: While logs provide excellent records of the loading process. It is important to check proper functioning of the web pages that use the data; links to which are provided with the description of each step.
Update Steps
All update steps should be run within a screen so that the process can continue even if your terminal receives a SIGHUP.
Mirror a new release
New releases are mirrored directly from the Hinxton FTP site to the primary WormBase FTP site. This process is run via cron but can also be run manually.
Mirror the next incremental release newer than what we already have:
./bin/mirror_new_release.pl
Or mirror a specific release:
./bin/mirror_new_release.pl WS150 // Mirror the WS150 release to /usr/local/ftp/pub/wormbase/releases/WS150
Stage a new release
Update Steps
Create necessary directories
Create staging directories for the update process.
Usage : ./steps/create_directories.pl ${RELEASE} Output : Directories in ${WORMBASE}/databases (primarily)
Mirror and unpack ACeDB from Sanger
Mirror and unpack the new release of the database from Sanger. Add in appropriate control files for the new acedb database: serverpasswrd.wrm, passwrd.wrm, serverconfig.wrm pulled from the checked out development source (/usr/local/wormbase/wspec).
Files will be mirrored and unpacked to /usr/local. Please make sure that there is sufficient space in this directory! You will most likely need approximately 25 GB of disk space. Possible places to free up disk space:
/usr/local/mysq/data /usr/local/wormbase/acedb/tmp ~{you}/mp3s
Usage : ./steps/mirror_acedb.pl ${RELEASE} ./steps/unpack_acedb.pl ${RELEASE} Input : Files mirrored from Sanger to ${ACEDB}/tmp Output : Unpacked AceDB files in ${ACEDB}/wormbase_${RELEASE}
Note: This can take a *long* time. You might to run this in a screen:
> screen > ./steps/mirror_acedb.pl WSXXX (to disconnect your screen) > ^a ^d (to resume your screen) > screen -r screen command reference
When complete:
> a new acedb directory should have been completed: wormbase_WS{RELEASE} and should contain subdirs: -- database -- wgf -- wquery -- wspec > check to make sure that the following directory and symlink exist: ${ACEDB}/wormbase -> wormbase_${RELEASE} > correct configuration file in wspec: serverconfig.wrm, server.wrm, serverpasswd.wrm
It is also good to check for a functional db -- try to connect to the acedb via a test script that creates a db handle. It may be necessary to restart the database:
> ps -ax | grep acedb ## to get acedb process number > kill -9 {AceDB proc number} ## stop current acedb process > sudo /etc/init.d/xinetd restart > ## run command for test script ##
Trouble shooting notes:
- gzip version incompatibilies may break the step.
Compile Gene Resources
Create precompiled gene page files specifically to populate the Phenotype tables.
Usage : ./steps/compile_gene_resource.pl ${RELEASE} Input : AceDB data Output : Files ${WORMBASE}/databases/${RELEASE}/gene
- gene_rnai_pheno.txt
- gene_xgene_pheno.txt
- phenotype_id2name.txt
- rnai_data.txt
- variation_data.txt
Mirror ontology from Sanger
Mirror OBO files from Sanger. These are necessary for the ontology searches.
Usage : ./steps/mirror_ontology_files.pl ${RELEASE} Input : none Output : Files mirrored to ${WORMBASE}/databases/${RELEASE}/ontology
Compile Ontology Resources
Take the mirrored files and compile them into the databases for the ontology searches.
Usage : ./steps/compile_ontology_resources.pl ${RELEASE} Input : OBO files mirrored earlier in ${WORMBASE}/databases/${RELEASE}/ontology; compiled data files from Compile Gene Resources step Output : to ${WORMBASE}/database/${RELEASE}/ontology:
- anatomy_association.RELEASE.wb
- gene_association.RELEASE.wb.ce
- gene_ontology.RELEASE.obo
- name2id.txt
- search_data.txt
- anatomy_ontology.RELEASE.obo
- gene_association.RELEASE.wb.cjp
- id2association_counts.txt
- parent2ids.txt
- gene_association.RELEASE.wb
- gene_association.RELEASE.wb.ppa
- id2name.txt
- phenotype_association.RELEASE.wb
- gene_association.RELEASE.wb.cb
- gene_association.RELEASE.wb.rem
- id2parents.txt
- phenotype_ontology.RELEASE.obo
Compile Orthology Resources
Create precompiled orthology and disease display and search related files
Usage : ./steps/compile_gene_data.pl ${RELEASE} ./steps/compile_ortholog_data.pl ${RELEASE} ./steps/compile_orthology_resources.pl ${RELEASE} Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files Output : Files ${WORMBASE}/databases/${RELEASE}/orthology
- all_proteins.txt
- disease_page_data.txt
- disease_search_data.txt
- full_disease_data.txt
- gene_association.$RELEASE.wb.ce
- gene_id2go_bp.txt
- gene_id2go_mf.txt
- gene_id2omim_ids.txt
- gene_id2phenotype.txt
- gene_list.txt
- go_id2omim_ids.txt
- go_ids2descendants.txt
- hs_ensembl_id2omim.txt
- hs_proteins.txt
- id2name.txt
- last_processed_gene.txt
- name2id.txt
- omim2disease.txt
- omim_id2all_ortholog_data.txt
- omim_id2disease_desc.txt
- omim_id2disease_name.txt
- omim_id2disease_notes.txt
- omim_id2disease_synonyms.txt
- omim_id2disease.txt
- omim_id2gene_name.txt
- omim_id2go_ids.txt
- omim_id2phenotypes.txt
- omim_reconfigured.txt
- ortholog_other_data_hs_only.txt
- ortholog_other_data.txt
Compile Interaction Data
Create precompiled gene page files specifically to populate interaction listing pages.
Usage : ./steps/compile_interaction_data.pl ${RELEASE} Input : AceDB interaction data Output : Files ${WORMBASE}/databases/${RELEASE}/interaction
- compiled_interaction_data.txt
Create BLAST databases for available species
Build BLAST databases for available species. For some species, this includes databases for genes, ests, proteins, and genomic sequence. For others, only genomic sequence and protein databases are constructed.
Usage : ./steps/create_blast_databases.pl ${RELEASE} Input : Genomic sequence and protein FASTA files mirrored from Sanger to ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz; Gene and EST sequences derived from AceDB Output : BLAST databases in ${WORMBASE}/databases/${RELEASE}/blast
NB: Ensure that the database is entered in /u/l/w/website-classic/html/blast_blat/search_form.html and /u/l/w/website-classic/cgi_perl/searches/blast_blat
Create BLAT databases for available species
Build BLAT databases of genomic sequence for each available species.
Usage : ./steps/create_blat_databases.pl ${RELEASE} Input : Genomic sequence FASTA files mirrored from Sanger to ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz Output : BLAT .nib files in ${WORMBASE}/databases/${RELEASE}/blat
Create ePCR databases for available species
Build ePCR databases for each species.
Usage : ./steps/create_epcr_databases.pl ${RELEASE} Input : Mirrored genomic sequence files from Sanger to ${FTP}/genomes/${SPECIES}/sequences/dna/${SPECIES}.dna.fa.gz Output : ePCR databases to ${WORMBASE}/databases/${RELEASE}/epcr
Load genomic GFF DBs for available species
Get genomic gff files from Sanger and load into the DBs
Usage :./steps/mirror_genomic_gffdb.pl ${RELEASE} ./steps/process_genomic_gff_files.pl ${RELEASE} ./steps/load_genomic_gffdb.pl ${RELEASE} Input : GFF and FASTA files mirrored from Sanger to GFF : ${FTP}/${SPECIES}/genome_feature_tables/GFF2/${SPECIES}.${VERSION}.gff.gz DNA : ${FTP}/${SPECIES}/sequences/dna/${SPECIES}.${VERSION}.dna.fa.gz Output: (This script both creates/mirrors and uses the files above).
Trouble shooting notes:
- File and directory names need to be consistent with format specified in Update.pm circa line 36
- If necessary, e.g. files were incorrectly named, they should be manually downloaded from the source site, uncompressed and renamed correctly
- Progress can be monitored by checking the log file ..admin/development/logs/{WSRELEASE}/load genomic feature gff databases.log and the building of the mysql db files in /usr/local/mysql/data
- The C. elegans build is particularly complex. One block is apparently the permissions for /u/l/mysql/data/c_elegans_{RELEASE} directory, set to 775.
- Granting permission to web requests: mysql -u root -p -e 'grant select on *.* to "www-data"@localhost'
- To restart mysql db: sudo /etc/init.d/mysql restart; sudo /etc/init.d/httpd graceful
Build and Load GFF patches
Create and load number of patches for the c_elegans GFF database, including protein motifs and genetic limits.
Usage : ./steps/load_gff_patches.pl ${RELEASE} Input : Files created to ${FTP}/genomes/c_elegans/genome_feature_tables/GFF2 Output : Files created above.
Convert GFF2 into GFF3
Notes...
Usage: ./steps/convert_gff2_to_gff3.pl ${RELEASE}
Create a GBrowse-driven genetic map
Notes...
Usage: ./steps/load_gmap_gffdb.pl ${RELEASE}
Create a GBrowse-driven physical map
Notes...
Usage: ./steps/load_pmap_gffdb.pl {WSRELEASE}
Create dump files of common datasets
Notes...
Load the CLUSTALW database
Notes...
Usage: ./steps/load_clustal_db.pl {WSRELEASE}
Mirror annotation files from Sanger to the FTP site
Notes...
Usage: ./steps/mirror_annotations.pl {WSRELEASE}