Difference between revisions of "Software Life Cycle: 1. Updating The Development Server"
(105 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
+ | '''THIS DOCUMENT IS NOW DEPRECATED. PLEASE REFER TO THE PROJECT DOCUMENTATION MAINTAINED ON GOOGLE DRIVE.''' | ||
+ | |||
+ | https://docs.google.com/a/wormbase.org/document/d/1oPpj8d5gibUc-gpUZorl6ETT5baE6mp-v2bMedKauiA/edit# | ||
+ | |||
+ | |||
= Overview = | = Overview = | ||
Line 116: | Line 121: | ||
<td>steps/load_genomic_gff_databases.pl<br/ > | <td>steps/load_genomic_gff_databases.pl<br/ > | ||
<td>W::U::Staging::LoadGenomicGFFDatabases</td> | <td>W::U::Staging::LoadGenomicGFFDatabases</td> | ||
+ | </tr> | ||
+ | |||
+ | <tr> | ||
+ | <td>Unpack and Load the ClustalW database</td> | ||
+ | <td>steps/unpack_clustalw_database.pl<br/ > | ||
+ | <td>W::U::Staging::UnpackClustalWDatabase</td> | ||
</tr> | </tr> | ||
Line 132: | Line 143: | ||
<tr> | <tr> | ||
− | <td> | + | <td>Compile Orthology resources</td> |
− | <td>steps/ | + | <td>steps/compile_orthology_resources.pl<br/ > |
− | <td>W::U::Staging:: | + | <td>W::U::Staging::CompileOrthologyResources</td> |
</tr> | </tr> | ||
Line 141: | Line 152: | ||
<td>Create commonly requested datasets</td> | <td>Create commonly requested datasets</td> | ||
<td>steps/dump_annotations.pl<br/ > | <td>steps/dump_annotations.pl<br/ > | ||
− | <td>W::U::DumpAnnotations</td> | + | <td>W::U::Staging::DumpAnnotations</td> |
+ | </tr> | ||
+ | |||
+ | <tr> | ||
+ | <td>Go Live</td> | ||
+ | <td>steps/go_live.pl<br/ > | ||
+ | <td>W::U::Staging::GoLive</td> | ||
+ | </tr> | ||
+ | |||
+ | <tr> | ||
+ | <td>Convert GFF2 To GFF3</td> | ||
+ | <td>steps/convert_gff2togff3.pl<br/ > | ||
+ | <td>W::U::Staging::ConvertGFF2ToGFF3</td> | ||
+ | </tr> | ||
+ | |||
+ | <tr> | ||
+ | <td>Precache content</td> | ||
+ | <td>steps/precache_content.pl<br/ > | ||
+ | <td>W::U::Staging::PrecacheContent</td> | ||
</tr> | </tr> | ||
Line 149: | Line 178: | ||
*Compile orthology resources | *Compile orthology resources | ||
*Compile interaction resources | *Compile interaction resources | ||
− | |||
*Build and load GFF patches | *Build and load GFF patches | ||
− | |||
*Create a GBrowse-driven genetic map | *Create a GBrowse-driven genetic map | ||
*Create a GBrowse-drive physical map | *Create a GBrowse-drive physical map | ||
− | |||
+ | |||
+ | === Purge old releases === | ||
+ | |||
+ | Clear out disk space by throwing away old releases. | ||
+ | |||
+ | ./steps/purge_old_releases.sh WSXXX // release to purge; clears out acedb, mysql, support DBs, and staging FTP | ||
=== Mirror a new release === | === Mirror a new release === | ||
Line 221: | Line 253: | ||
DNA : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz | DNA : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz | ||
+ | === Unpack and Load the CLUSTALW database === | ||
+ | |||
+ | Usage: ./steps/unpack_clustal_database.pl {WSRELEASE} | ||
+ | Input: ${FTP}/releases/${RELEASE}/COMPARATIVE_ANALYSIS/wormpep${RELEASE}.clw.sql.bz2 | ||
+ | Output: a new mysql database called clustal_${RELEASE} | ||
+ | |||
+ | == Pre-compiled file documentation for Beta == | ||
+ | |||
+ | NOTE: no need to pre-compile Gene and Interaction data resources any more | ||
+ | |||
+ | === Compile Ontology Resources === | ||
+ | |||
+ | TODO: This step is just copying over files, could be avoided but maybe file access issue? | ||
+ | |||
+ | In Wormbase.conf | ||
+ | association_count_file = /usr/local/wormbase/databases/%s/ontology/%s_association.%s.wb | ||
+ | |||
+ | |||
+ | copy over the mirrored ontology files under ftp directory, these files are used for calculating the counts of associated terms in Ontology Browser. | ||
+ | |||
+ | Usage : ./steps/compile_ontology_resources.pl ${RELEASE} | ||
+ | Input : WB files staged at: /usr/local/ftp/pub/wormbase/releases/WSXXX/ONTOLOGY | ||
+ | |||
+ | *anatomy_association.RELEASE.wb | ||
+ | *gene_association.RELEASE.wb | ||
+ | *phenotype_association.RELEASE.wb | ||
+ | |||
+ | Output : to ${WORMBASE}/database/${RELEASE}/ontology: | ||
+ | |||
+ | *anatomy_association.RELEASE.wb | ||
+ | *gene_association.RELEASE.wb | ||
+ | *phenotype_association.RELEASE.wb | ||
+ | |||
+ | === Compile Orthology Resources === | ||
+ | |||
+ | Usage : ./steps/compile_orthology_resources.pl ${RELEASE} | ||
+ | Input : omim.txt and morbidmap files from OMIM | ||
+ | |||
+ | Intermediate: gene2omim.txt (query AceDB and build the worm gene to human disease relationship based on its human ortholog) | ||
+ | |||
+ | Output : Files ${WORMBASE}/databases/${RELEASE}/orthology | ||
+ | Disease.ace (load into Xapian, use for disease/search and disease page) | ||
=== Compile Gene Resources === | === Compile Gene Resources === | ||
+ | |||
+ | BROKEN | ||
Create precompiled gene page files specifically to populate the Phenotype tables. | Create precompiled gene page files specifically to populate the Phenotype tables. | ||
Line 230: | Line 306: | ||
Output : Files ${WORMBASE}/databases/${RELEASE}/gene | Output : Files ${WORMBASE}/databases/${RELEASE}/gene | ||
− | *gene_rnai_pheno.txt | + | *gene_rnai_pheno.txt (gene/gene) |
− | *gene_xgene_pheno.txt | + | *gene_xgene_pheno.txt (gene/gene) |
− | *phenotype_id2name.txt | + | *phenotype_id2name.txt (gene/gene) |
− | *rnai_data.txt | + | *rnai_data.txt (gene/gene) |
− | *variation_data.txt | + | *variation_data.txt (gene/gene) |
=== Compile Ontology Resources === | === Compile Ontology Resources === | ||
Line 240: | Line 316: | ||
TODO: This step relies on a number of external helper scripts that should ALL be folded into CompileGeneResources. They are located at | TODO: This step relies on a number of external helper scripts that should ALL be folded into CompileGeneResources. They are located at | ||
− | staging/helpers/gene_summary | + | staging/helpers/gene_summary |
Take the mirrored ontology files and compile them into the databases for the ontology searches. | Take the mirrored ontology files and compile them into the databases for the ontology searches. | ||
Line 269: | Line 345: | ||
*search_data.txt | *search_data.txt | ||
*parent2ids.txt (ontology/tree_lister) | *parent2ids.txt (ontology/tree_lister) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Compile Orthology Resources === | === Compile Orthology Resources === | ||
+ | Create precompiled orthology and disease display and search related files | ||
− | + | This MUST be run after the ontology step above. | |
− | Usage : | + | Usage : ./steps/compile_orthology_resources.pl ${RELEASE} |
− | |||
− | |||
Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files | Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files | ||
*gene_association.$RELEASE.wb.ce | *gene_association.$RELEASE.wb.ce | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Intermediate: | Intermediate: | ||
Line 329: | Line 362: | ||
*disease_page_data.txt | *disease_page_data.txt | ||
*full_disease_data.txt | *full_disease_data.txt | ||
+ | *gene_id2go_bp.txt | ||
+ | *gene_id2go_mf.txt | ||
+ | *gene_id2phenotype.txt | ||
+ | *gene_list.txt | ||
*hs_proteins.txt | *hs_proteins.txt | ||
+ | *last_processed_gene.txt | ||
*omim2disease.txt | *omim2disease.txt | ||
*omim_id2go_ids.txt | *omim_id2go_ids.txt | ||
Line 335: | Line 373: | ||
*omim_id2disease_synonyms.txt | *omim_id2disease_synonyms.txt | ||
*omim_reconfigured.txt | *omim_reconfigured.txt | ||
+ | *ortholog_other_data.txt | ||
*ortholog_other_data_hs_only.txt | *ortholog_other_data_hs_only.txt | ||
Line 366: | Line 405: | ||
*compiled_interaction_data.txt | *compiled_interaction_data.txt | ||
− | === | + | === Convert GFF2 into GFF3 === |
− | + | Usage: ./steps/convert_gff2_to_gff3.pl ${RELEASE} | |
− | + | === Create files of commonly requested datasets === | |
− | |||
− | |||
− | |||
− | + | Usage: ./steps/dump_annotations.pl {WSRELEASE} | |
+ | Output: datasets in ${FTP}/releases/${RELEASE}/annotations and species/annotations | ||
− | + | The staging harness will automatically run scripts in annotation_dumpers/*. These scripts should abide by the following conventions: | |
+ | |||
+ | 1. Be located in update/staging/annotation_dumpers | ||
+ | 2. Be named either | ||
+ | dump_species_* for species level data (like brief IDs) | ||
+ | dump_resource_* for resource level data (like laboratories) | ||
+ | 3. Follow existing examples, including available parameters. | ||
+ | 4. Dump to STDERR and STDOUT. | ||
+ | Notes: | ||
+ | |||
+ | 1. dump_species_* will be called for each species managed by WormBase | ||
+ | and will end up in | ||
+ | ${FTP_ROOT}/releases/[RELEASE]/species/[G_SPECIES]/annotation/[G_SPECIES].[RELEASE].[DESCRIPTION].txt | ||
+ | dump_resource_* will be called once and end up in | ||
+ | ${FTP_ROOT}/datasets-wormbase/wormbase.[RELEASE].[DESCRIPTION].txt | ||
+ | 2. The filename will be created by stripping off dump_species_ or dump_resource_. | ||
+ | Species specific resources will be prepended with the appropriate species. | ||
− | |||
− | |||
− | |||
− | === | + | === Create a GBrowse-driven genetic map === |
Notes... | Notes... | ||
− | Usage: ./steps/ | + | Usage: ./steps/load_gmap_gffdb.pl ${RELEASE} |
+ | |||
+ | === Go Live === | ||
+ | |||
+ | steps/go_live.pl WSXXX | ||
+ | |||
+ | This script will | ||
+ | * create a series of symlinks in the FTP site (for example, to maintain the virtually organized species/ directory) | ||
+ | * create "current" symlinks in the FTP site for easy access. | ||
+ | * adjust symlinks to mysql GFF databases updated this release. | ||
+ | * adjust the symlink at /usr/local/wormbase/acedb/wormbase -> to point to the new wormbase_WSXXX unpacked acedb. | ||
+ | * Sync the staging FTP site to the production FTP site. | ||
+ | |||
+ | If you omit the WSXXX on the command line, the script will simply organize the virtual directories on the ftp site up to and including the current release. MySQL and AceDB symlinks will not be created. | ||
+ | |||
+ | === Branch the web code === | ||
+ | |||
+ | For each major WS release, create a corresponding branch in the git repository. We branch the moment a new release is staged so that we can begin development for that model. This can be done from any repository. | ||
+ | |||
+ | staging> cd /usr/local/wormbase/website/production | ||
+ | staging> git pull | ||
+ | // Creating a tag... | ||
+ | // staging> git tag -a -m "WSXXX" WSXXX HEAD | ||
+ | // staging> git push --tags | ||
+ | // Create a new branch, one tracking the remote master repository | ||
+ | staging> git branch --track WSXXX origin/master | ||
+ | staging> git branch // list all branches | ||
+ | // Push the branch to the remote repository | ||
+ | staging> git push origin WSXXX | ||
+ | staging> git push | ||
+ | |||
+ | = Steps to execute after a release has been staged = | ||
+ | |||
+ | == Precache content == | ||
+ | |||
+ | Once a release has been successfully staged and tested, we pre-cache select computationally intensive content to a CouchDB instance located on the development server. | ||
+ | |||
+ | Precaching works as follows. | ||
+ | |||
+ | 1. The primary Catalyst configuration file is read. | ||
+ | |||
+ | 2. For each widget set to "precache = true" in config, REST requests will be constructed against staging.wormbase.org. This will be running the NEW version of WormBase. | ||
+ | |||
+ | 3A. The webapp returns HTML; the precache script stores it in the reference (production) couchdb. | ||
+ | |||
+ | OR | ||
+ | |||
+ | 3B. The web app on staging.wormbase.org will automatically cache result in the reference couchdb (currently web6); the couchdb that is written to can be configured in wormbase.conf. | ||
+ | |||
+ | 4. The reference couchDB will then be replicated during production release to each node, scaling horizontally. | ||
+ | |||
+ | 5. During a production cycle, additional content will be stored in the reference couchdb; this is synced periodically to each node. | ||
+ | |||
+ | See [[Administration:WormBase_Production_Environment#CouchDB|CouchDB]] for details. | ||
+ | |||
+ | == Purge old releases == | ||
+ | |||
+ | To purge previous releases from the production and staging nodes, | ||
+ | |||
+ | staging/steps/purge_old_releases.pl --release WSXXX | ||
+ | |||
+ | This will remove the following: | ||
+ | |||
+ | /usr/local/wormbase/acedb/wormbase_WSXXX | ||
+ | /usr/local/wormbase/databases/WSXXX | ||
+ | /usr/local/mysql/data/WSXXX | ||
+ | |||
+ | And on the staging host | ||
+ | /usr/local/ftp/pub/wormbase/releases/WSXXX | ||
+ | |||
+ | == Compiled file documentation and plans == | ||
+ | |||
+ | <table border="1" width="100%"> | ||
+ | <tr> | ||
+ | <th>Step</th> | ||
+ | <th>File</th> | ||
+ | <th>Description</th> | ||
+ | <th>WB2 update</th> | ||
+ | </tr> | ||
+ | |||
+ | <tr> | ||
+ | <td>Compile Gene Resources </td> | ||
+ | <td>gene_rnai_pheno.txt </td> | ||
+ | <td> | ||
+ | * many-to-many listing of Gene_ids to RNAi_ids and Related phenotype ID. (or not) | ||
+ | * Used in the classic gene summary for phenotype tables</td> | ||
+ | <td>TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace</td> | ||
+ | </tr> | ||
+ | |||
+ | |||
+ | <tr> | ||
+ | <td>Compile Gene Resources </td> | ||
+ | <td>gene_xgene_pheno.txt</td> | ||
+ | <td> | ||
+ | * many-to-many listing of Gene_ids to Transgene_ids and Related phenotype ID. (or not) | ||
+ | * Used in the classic gene summary for phenotype tables</td> | ||
+ | <td>TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | |||
+ | <td>Compile Gene Resources </td> | ||
+ | <td>phenotype_id2name.txt</td> | ||
+ | <td> | ||
+ | * listing of Phenotype_ids to Phenotype names. | ||
+ | * Used in the classic gene summary for phenotype tables in order to obviate the extraction of individual phenotype objects and their names</td> | ||
+ | <td>TODO: function will be deprecated since individual phenotype objects will be extracted</td> | ||
+ | </tr> | ||
+ | <tr> | ||
+ | |||
+ | <tr> | ||
+ | <td>Compile Gene Resources </td> | ||
+ | <td>rnai_data.txt</td> | ||
+ | <td> | ||
+ | * listing of RNAi data for the RNAi table in gene/gene<br/ > | ||
+ | * Used in the classic gene summary for RNAi tables</td> | ||
+ | <td>TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Gene Resources </td> | ||
+ | <td>variation_data.txt </td> | ||
+ | <td> | ||
+ | * many-to-many listing of Gene_ids to RNAi_ids and Related phenotype ID. (or not) | ||
+ | * Used in the classic gene summary for phenotype tables</td> | ||
+ | <td>TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace</td> | ||
+ | </tr> | ||
− | |||
− | + | <tr> | |
+ | <td>Compile Ontology Resources </td> | ||
+ | <td>id2association_counts.txt</td> | ||
+ | <td> | ||
+ | * listing of ontology object ids (GO, Anatomy_term, Phenotype) to the number of annotations to the term | ||
+ | * used in tree_lister (browser) | ||
+ | <td>retain for browser, move into tied hash?</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Ontology Resources </td> | ||
+ | <td>id2name.txt</td> | ||
+ | <td> | ||
+ | * listing of ontology object ids (GO, Anatomy_term, Phenotype) to the term | ||
+ | * used in tree_lister (browser) | ||
+ | <td>retain for browser, move into tied hash?</td> | ||
+ | </tr> | ||
− | |||
− | + | <tr> | |
+ | <td>Compile Ontology Resources </td> | ||
+ | <td>id2parents.txt</td> | ||
+ | <td> | ||
+ | * one-to-many listing of ontology object ids (GO, Anatomy_term, Phenotype) to the parent terms and respective relationship | ||
+ | * used in tree_lister (browser) | ||
+ | <td>retain for browser, move into tied hash?</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Ontology Resources </td> | ||
+ | <td>id2total_associations.txt</td> | ||
+ | <td> | ||
+ | * listing of ontology object terms (GO, Anatomy_term, Phenotype) to the id | ||
+ | * used in tree_lister (browser) | ||
+ | <td>retain for browser, move into tied hash?</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Ontology Resources </td> | ||
+ | <td>search_data.txt </td> | ||
+ | <td> | ||
+ | * pipe-delieneated data on each term including synonyms and annotations | ||
+ | * Used in GO, Anatomy_term, and Phenotype searches | ||
+ | <td>To be superceded by Xapian search</td> | ||
+ | </tr> | ||
− | |||
− | + | <td>Compile Ontology Resources </td> | |
+ | <td>parent2ids.txt</td> | ||
+ | <td> | ||
+ | * one-to-many listing of ontology object ids (GO, Anatomy_term, Phenotype) to their immediate descendants term ids | ||
+ | * used in tree_lister (browser) | ||
+ | <td>retain for browser, move into tied hash?</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>disease_search_data.txt </td> | ||
+ | <td> | ||
+ | * Pipe delineated file containing details on the diseases extracted from OMIM | ||
+ | * used in disease search | ||
+ | <td> Use data for Xapian search; work with Abby</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>gene_id2omim_ids.txt</td> | ||
+ | <td> | ||
+ | * one-to-many listing of gene_ids to omim IDs | ||
+ | * used in orthology/disease | ||
+ | <td>Keep for disease object</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>go_id2omim_ids.txt</td> | ||
+ | <td> | ||
+ | * one-to-many listing of gene_ids to omim IDs | ||
+ | * used in orthology/disease and ontology/gene | ||
+ | </td> | ||
+ | <td>useful for further paralog data expansion and integration</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>go_ids2descendants.txt</td> | ||
+ | <td> | ||
+ | * one-to-many listing of go ids to its list of the go ids of its descendants | ||
+ | * plan was to use this data for paralog display in orthology/gene | ||
+ | </td> | ||
+ | <td>useful for further paralog data expansion and integration</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>hs_ensembl_id2omim.txt</td> | ||
+ | <td> | ||
+ | * one-to-one listing of hs ensembl ids to omim ids | ||
+ | * used in orthology/gene | ||
+ | </td> | ||
+ | <td>disease UI</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>id2name.txt </td> | ||
+ | <td> | ||
+ | * listing of ontology object ids (GO, Anatomy_term, Phenotype) to the term | ||
+ | * used in orthology/disease & orthology/gene | ||
+ | </td> | ||
+ | <td>useful for further paralog data expansion and integration(?)</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>name2id.txt</td> | ||
+ | <td> | ||
+ | * listing of ontology object terms(GO, Anatomy_term, Phenotype) to the id | ||
+ | * used in orthology/disease | ||
+ | </td> | ||
+ | <td>useful for further paralog data expansion and integration(?)</td> | ||
+ | </tr> | ||
− | |||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>omim_id2all_ortholog_data.txt </td> | ||
+ | <td> | ||
+ | * pipe delineated file containing details of the ortholog associated with the omim id | ||
+ | * used in orthology/disease | ||
+ | </td> | ||
+ | <td>use to generate Xapian data; work with Abby</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>omim_id2disease_desc.txt </td> | ||
+ | <td> | ||
+ | * one-to-one listing of omim ids and the disease description | ||
+ | * used in orthology/disease | ||
+ | </td> | ||
+ | <td>use in Disease object model and UI</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>omim_id2disease_name.txt</td> | ||
+ | <td> | ||
+ | * one-to-one listing of omim ids and the disease name | ||
+ | * used in orthology/disease | ||
+ | </td> | ||
+ | <td>use in Disease object model and UI</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>omim_id2disease_notes.txt </td> | ||
+ | <td> | ||
+ | * one-to-one listing of omim ids and the disease notes from omim | ||
+ | * used in orthology/disease | ||
+ | </td> | ||
+ | <td>use in Disease object model and UI</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>omim_id2disease.txt </td> | ||
+ | <td> | ||
+ | * one-to-one listing of omim ids and the disease names | ||
+ | * used in orthology/disease | ||
+ | </td> | ||
+ | <td>use in Disease object model and UI</td> | ||
+ | </tr> | ||
− | + | <tr> | |
+ | <td>Compile Orthology Resources</td> | ||
+ | <td>omim_id2gene_name.txt</td> | ||
+ | <td> | ||
+ | * one-to-many listing of omim ids to gene names | ||
+ | * used in orthology/search | ||
+ | </td> | ||
+ | <td>probably deprecate in updating Disease object model</td> | ||
+ | </tr> | ||
− | |||
− | + | </table> |
Latest revision as of 03:11, 28 October 2013
THIS DOCUMENT IS NOW DEPRECATED. PLEASE REFER TO THE PROJECT DOCUMENTATION MAINTAINED ON GOOGLE DRIVE.
https://docs.google.com/a/wormbase.org/document/d/1oPpj8d5gibUc-gpUZorl6ETT5baE6mp-v2bMedKauiA/edit#
Contents
- 1 Overview
- 2 Document Conventions
- 3 Staging Pipeline Code
- 4 Running the Update Pipeline
- 4.1 Log Files
- 4.2 Executing the Pipeline
- 4.3 Update Steps
- 4.4 Pre-compiled file documentation for Beta
- 4.4.1 Compile Ontology Resources
- 4.4.2 Compile Orthology Resources
- 4.4.3 Compile Gene Resources
- 4.4.4 Compile Ontology Resources
- 4.4.5 Compile Orthology Resources
- 4.4.6 Compile Interaction Data
- 4.4.7 Convert GFF2 into GFF3
- 4.4.8 Create files of commonly requested datasets
- 4.4.9 Create a GBrowse-driven genetic map
- 4.4.10 Go Live
- 4.4.11 Branch the web code
- 5 Steps to execute after a release has been staged
Overview
This document describes the process of staging a new release of WormBase on the development server.
The automated staging pipeline consists of:
- a harness that handles logging, error trapping, and basic shared functions
- a suite of modules -- one per step -- that implement the step or make calls to helper scripts
- helper scripts in Perl or shell that assist in implementation
Control of the pipeline: You can use the pipeline in several ways:
- Launch the full pipeline via the control script, the preferred and automated method.
- Run individual steps in the context of the pipeline using control scripts in steps/, useful if the pipeline fails at a specific point.
- Directly run helper scripts outside of the logging facilities of the pipeline, useful if you need to rebuild something quickly.
Document Conventions
The current development server is
wb-dev: wb-dev.oicr.on.ca (FQDN); aka: dev.wormbase.org
When indicated, substitute WSXXX for ${RELEASE}.
System paths referred to in this document:
FTP : /usr/local/ftp/pub/wormbase WORMBASE : /usr/local/wormbase ACEDB : /usr/local/wormbase/acedb
Staging Pipeline Code
The update pipeline code is available in the website-admin module on github:
tharris> git clone git@github.com:WormBase/website-admin.git tharris> cd website-admin/update lib/ -- the shared library suite that handles updates. staging/ -- code related to staging data on the development site. production/ -- code related to the releasing data/code into production.
The contents are:
logs/ -- the logs directory for each step/update bin/ -- Perl scripts for manually launching individual steps. README.txt -- directory listing updatelog.conf -- a configuration file for the update process update.sh -- master script that fires off each step of the pipeline util/ -- various helper scripts for the update process
Running the Update Pipeline
Log Files
The Staging Pipeline creates informative logs for each step of the process. Logs are located at:
/usr/local/wormbase/logs/staging/WSXXX master.log -- Master log tracks all steps; useful for a meta-view of the pipeline. Contains INFO, WARN, ERROR, and FATAL messages. master.err -- Master error log tracks ERROR and FATAL messages encountered across all steps.
Each individual step creates its own log file capturing STDERR and STDOUT containing informative messages from the pipeline. These are useful for tracking progress and triaging problems. For example:
/usr/local/wormbase/logs/staging/WSXXX/build_blast_databases/ step.log -- step-specific log tracking everything from TRACE on up. step.err -- step-specific error log tracking ERROR and FATAL messages. Good place to check if a step breaks.
Executing the Pipeline
A single script fires off all steps of the process. You should run it inside a screen.
tharris> screen tharris> ./stage_via_pipeline.pl WSXXX (to disconnect your screen) tharris> ^a ^d (to resume your screen) tharris> screen -r
Monitor progress of the update by following the master log file:
tharris> tail -f /usr/local/wormbase/logs/staging/WSXXX/master.log
screen command reference
Update Steps
The steps that comprise the pipeline, the script to launch them, and the module that implements are listed below.
step | control script | module |
---|---|---|
Mirror a new release | steps/mirror_new_release.pl (manual) | W::U::Staging::MirrorNewRelease |
Unpack ACeDB | steps/unpack_acedb.pl (manual) | W::U::Staging::UnpackAcedb |
Create BLAST databases | steps/create_blast_databases.pl | W::U::Staging::CreateBlastDatabases |
Create BLAT databases | steps/create_blat_databases.pl | W::U::Staging::CreateBlatDatabases |
Load Genomic GFF databases | steps/load_genomic_gff_databases.pl | W::U::Staging::LoadGenomicGFFDatabases |
Unpack and Load the ClustalW database | steps/unpack_clustalw_database.pl | W::U::Staging::UnpackClustalWDatabase |
Compile Gene Summary resources | steps/compile_gene_resources.pl | W::U::Staging::CompileGeneResources |
Compile Ontology resources | steps/compile_ontology_resources.pl | W::U::Staging::CompileOntologyResources |
Compile Orthology resources | steps/compile_orthology_resources.pl | W::U::Staging::CompileOrthologyResources |
Create commonly requested datasets | steps/dump_annotations.pl | W::U::Staging::DumpAnnotations |
Go Live | steps/go_live.pl | W::U::Staging::GoLive |
Convert GFF2 To GFF3 | steps/convert_gff2togff3.pl | W::U::Staging::ConvertGFF2ToGFF3 |
Precache content | steps/precache_content.pl | W::U::Staging::PrecacheContent |
- Compile orthology resources
- Compile interaction resources
- Build and load GFF patches
- Create a GBrowse-driven genetic map
- Create a GBrowse-drive physical map
Purge old releases
Clear out disk space by throwing away old releases.
./steps/purge_old_releases.sh WSXXX // release to purge; clears out acedb, mysql, support DBs, and staging FTP
Mirror a new release
New releases are mirrored directly from the Hinxton FTP site to the primary WormBase FTP site hosted on wb-dev:/usr/local/ftp. This process is run via cron but can also be run manually.
# Mirror the next incremental release newer than what we already have: # Cron: ./steps/mirror_new_release.pl # Or mirror a specific release: ./steps/mirror_new_release.pl --release WS150 // Mirror the WS150 release to /usr/local/ftp/pub/wormbase/releases/WS150
Unpack Acedb
Unpack AceDB from the new release. Customize the new installation with skeletal files located at /usr/local/wormbase/website/classic/wspec. You will need approximately 25 GB of disk space per release.
via pipeline: ./steps/unpack_acedb.pl ${RELEASE} via helper : helpers/unpack_acedb.sh ${RELEASE} Input : Files staged at ${FTP}/releases/${RELEASE}/species Output : Unpacked AceDB files in ${ACEDB}/wormbase_${RELEASE}
When complete, you should have a new acedb directory containing:
-- database -- wgf -- wquery -- wspec
Test the database by:
> ps -ax | grep acedb ## to get acedb process number > kill -9 {AceDB proc number} ## stop current acedb process > sudo /etc/init.d/xinetd restart > saceclient localhost -port 2005
Create BLAST databases
Build nucleotide and protein BLAST databases for species with genomic sequence and conceptual translations. In addition, for C. elegans and C. briggsae, we build blast databases for ESTs and "genes" (actually clones).
Usage : ./steps/create_blast_databases.pl ${RELEASE} Input : Genomic sequence and protein FASTA files staged at: ${FTP}/releases/species/${SPECIES}.${RELEASE}.genomic.fa.gz ${FTP}/releases/species/${SPECIES}.${RELEASE}.protein.fa.gz Gene and EST sequences derived from AceDB Output : BLAST databases in ${WORMBASE}/databases/${RELEASE}/blast/${SPECIES}.
Create BLAT databases
Build BLAT databases of genomic sequence.
Usage : ./steps/create_blat_databases.pl ${RELEASE} Input : Genomic sequence FASTA files staged at ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz Output : BLAT .nib files in ${WORMBASE}/databases/${RELEASE}/blat/${SPECIES}
Load genomic GFF annotations
Convert GFF files into Bio::DB::GFF (GFF2) or Bio::DB::SeqFeature::Store (GFF3) databases.
Usage : ./steps/load_genomic_gff_databases.pl ${RELEASE} Input : GFF and FASTA files staged at: GFF : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.gff[2|3].gz DNA : ${FTP}/releases/species/${SPECIES}/${SPECIES}.${RELEASE}.genomic.fa.gz
Unpack and Load the CLUSTALW database
Usage: ./steps/unpack_clustal_database.pl {WSRELEASE} Input: ${FTP}/releases/${RELEASE}/COMPARATIVE_ANALYSIS/wormpep${RELEASE}.clw.sql.bz2 Output: a new mysql database called clustal_${RELEASE}
Pre-compiled file documentation for Beta
NOTE: no need to pre-compile Gene and Interaction data resources any more
Compile Ontology Resources
TODO: This step is just copying over files, could be avoided but maybe file access issue?
In Wormbase.conf association_count_file = /usr/local/wormbase/databases/%s/ontology/%s_association.%s.wb
copy over the mirrored ontology files under ftp directory, these files are used for calculating the counts of associated terms in Ontology Browser.
Usage : ./steps/compile_ontology_resources.pl ${RELEASE} Input : WB files staged at: /usr/local/ftp/pub/wormbase/releases/WSXXX/ONTOLOGY
- anatomy_association.RELEASE.wb
- gene_association.RELEASE.wb
- phenotype_association.RELEASE.wb
Output : to ${WORMBASE}/database/${RELEASE}/ontology:
- anatomy_association.RELEASE.wb
- gene_association.RELEASE.wb
- phenotype_association.RELEASE.wb
Compile Orthology Resources
Usage : ./steps/compile_orthology_resources.pl ${RELEASE} Input : omim.txt and morbidmap files from OMIM
Intermediate: gene2omim.txt (query AceDB and build the worm gene to human disease relationship based on its human ortholog)
Output : Files ${WORMBASE}/databases/${RELEASE}/orthology Disease.ace (load into Xapian, use for disease/search and disease page)
Compile Gene Resources
BROKEN
Create precompiled gene page files specifically to populate the Phenotype tables.
Usage : ./steps/compile_gene_resource.pl ${RELEASE} Input : AceDB data Output : Files ${WORMBASE}/databases/${RELEASE}/gene
- gene_rnai_pheno.txt (gene/gene)
- gene_xgene_pheno.txt (gene/gene)
- phenotype_id2name.txt (gene/gene)
- rnai_data.txt (gene/gene)
- variation_data.txt (gene/gene)
Compile Ontology Resources
TODO: This step relies on a number of external helper scripts that should ALL be folded into CompileGeneResources. They are located at
staging/helpers/gene_summary
Take the mirrored ontology files and compile them into the databases for the ontology searches.
Usage : ./steps/compile_ontology_resources.pl ${RELEASE} Input : OBO files staged at: /usr/local/ftp/pub/wormbase/releases/WSXXX/ONTOLOGY compiled data files from Compile Gene Resources step
- anatomy_association.RELEASE.wb
- anatomy_ontology.RELEASE.obo
- gene_association.RELEASE.wb
- gene_association.RELEASE.wb.cb
- gene_association.RELEASE.wb.ce
- gene_association.RELEASE.wb.cjp
- gene_association.RELEASE.wb.ppa
- gene_association.RELEASE.wb.rem
- gene_ontology.RELEASE.obo
- phenotype_association.RELEASE.wb
- phenotype_ontology.RELEASE.obo
Output : to ${WORMBASE}/database/${RELEASE}/ontology:
- id2association_counts.txt (ontology/tree_lister)
- id2name.txt (ontology/tree_lister)
- id2parents.txt (ontology/tree_lister)
- id2total_associations.txt (ontology/tree_lister)
- name2id.txt
- search_data.txt
- parent2ids.txt (ontology/tree_lister)
Compile Orthology Resources
Create precompiled orthology and disease display and search related files
This MUST be run after the ontology step above.
Usage : ./steps/compile_orthology_resources.pl ${RELEASE} Input : AceDB data, omim.txt and morbidmap files from OMIM, ontology resources files
- gene_association.$RELEASE.wb.ce
Intermediate:
- all_proteins.txt
- disease_page_data.txt
- full_disease_data.txt
- gene_id2go_bp.txt
- gene_id2go_mf.txt
- gene_id2phenotype.txt
- gene_list.txt
- hs_proteins.txt
- last_processed_gene.txt
- omim2disease.txt
- omim_id2go_ids.txt
- omim_id2phenotypes.txt
- omim_id2disease_synonyms.txt
- omim_reconfigured.txt
- ortholog_other_data.txt
- ortholog_other_data_hs_only.txt
Output : Files ${WORMBASE}/databases/${RELEASE}/orthology (summary page using files in parenthesis)
- disease_search_data.txt (orthology/search)
- gene_id2omim_ids.txt (orthology/disease)
- go_id2omim_ids.txt (orthology/disease,ontology/gene)
- go_ids2descendants.txt (orthology/gene)
- hs_ensembl_id2omim.txt (orthology/gene)
- id2name.txt (orthology/disease, orthology/gene)
- name2id.txt (orthology/disease)
- omim_id2all_ortholog_data.txt (orthology/disease)
- omim_id2disease_desc.txt (orthology/disease)
- omim_id2disease_name.txt (orthology/disease,ontology/gene)
- omim_id2disease_notes.txt (orthology/disease)
- omim_id2disease.txt (orthology/gene)
- omim_id2gene_name.txt (orthology/search)
Compile Interaction Data
DEPRECATED. NO NEED TO MIGRATE THIS INTO THE NEW STAGING PIPELINE.
Create precompiled gene page files specifically to populate interaction listing pages.
Usage : ./steps/compile_interaction_data.pl ${RELEASE} Input : AceDB interaction data Output : Files ${WORMBASE}/databases/${RELEASE}/interaction
- compiled_interaction_data.txt
Convert GFF2 into GFF3
Usage: ./steps/convert_gff2_to_gff3.pl ${RELEASE}
Create files of commonly requested datasets
Usage: ./steps/dump_annotations.pl {WSRELEASE} Output: datasets in ${FTP}/releases/${RELEASE}/annotations and species/annotations
The staging harness will automatically run scripts in annotation_dumpers/*. These scripts should abide by the following conventions:
1. Be located in update/staging/annotation_dumpers 2. Be named either dump_species_* for species level data (like brief IDs) dump_resource_* for resource level data (like laboratories) 3. Follow existing examples, including available parameters. 4. Dump to STDERR and STDOUT. Notes: 1. dump_species_* will be called for each species managed by WormBase and will end up in ${FTP_ROOT}/releases/[RELEASE]/species/[G_SPECIES]/annotation/[G_SPECIES].[RELEASE].[DESCRIPTION].txt dump_resource_* will be called once and end up in ${FTP_ROOT}/datasets-wormbase/wormbase.[RELEASE].[DESCRIPTION].txt 2. The filename will be created by stripping off dump_species_ or dump_resource_. Species specific resources will be prepended with the appropriate species.
Create a GBrowse-driven genetic map
Notes...
Usage: ./steps/load_gmap_gffdb.pl ${RELEASE}
Go Live
steps/go_live.pl WSXXX
This script will
- create a series of symlinks in the FTP site (for example, to maintain the virtually organized species/ directory)
- create "current" symlinks in the FTP site for easy access.
- adjust symlinks to mysql GFF databases updated this release.
- adjust the symlink at /usr/local/wormbase/acedb/wormbase -> to point to the new wormbase_WSXXX unpacked acedb.
- Sync the staging FTP site to the production FTP site.
If you omit the WSXXX on the command line, the script will simply organize the virtual directories on the ftp site up to and including the current release. MySQL and AceDB symlinks will not be created.
Branch the web code
For each major WS release, create a corresponding branch in the git repository. We branch the moment a new release is staged so that we can begin development for that model. This can be done from any repository.
staging> cd /usr/local/wormbase/website/production staging> git pull // Creating a tag... // staging> git tag -a -m "WSXXX" WSXXX HEAD // staging> git push --tags // Create a new branch, one tracking the remote master repository staging> git branch --track WSXXX origin/master staging> git branch // list all branches // Push the branch to the remote repository staging> git push origin WSXXX staging> git push
Steps to execute after a release has been staged
Precache content
Once a release has been successfully staged and tested, we pre-cache select computationally intensive content to a CouchDB instance located on the development server.
Precaching works as follows.
1. The primary Catalyst configuration file is read.
2. For each widget set to "precache = true" in config, REST requests will be constructed against staging.wormbase.org. This will be running the NEW version of WormBase.
3A. The webapp returns HTML; the precache script stores it in the reference (production) couchdb.
OR
3B. The web app on staging.wormbase.org will automatically cache result in the reference couchdb (currently web6); the couchdb that is written to can be configured in wormbase.conf.
4. The reference couchDB will then be replicated during production release to each node, scaling horizontally.
5. During a production cycle, additional content will be stored in the reference couchdb; this is synced periodically to each node.
See CouchDB for details.
Purge old releases
To purge previous releases from the production and staging nodes,
staging/steps/purge_old_releases.pl --release WSXXX
This will remove the following:
/usr/local/wormbase/acedb/wormbase_WSXXX /usr/local/wormbase/databases/WSXXX /usr/local/mysql/data/WSXXX
And on the staging host
/usr/local/ftp/pub/wormbase/releases/WSXXX
Compiled file documentation and plans
Step | File | Description | WB2 update |
---|---|---|---|
Compile Gene Resources | gene_rnai_pheno.txt |
|
TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace |
Compile Gene Resources | gene_xgene_pheno.txt |
|
TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace |
Compile Gene Resources | phenotype_id2name.txt |
|
TODO: function will be deprecated since individual phenotype objects will be extracted |
Compile Gene Resources | rnai_data.txt |
|
TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace |
Compile Gene Resources | variation_data.txt |
|
TODO: update the appropriate method in Gene.pm to pull data for given gene directly from Ace |
Compile Ontology Resources | id2association_counts.txt |
| retain for browser, move into tied hash? |
Compile Ontology Resources | id2name.txt |
| retain for browser, move into tied hash? |
Compile Ontology Resources | id2parents.txt |
| retain for browser, move into tied hash? |
Compile Ontology Resources | id2total_associations.txt |
| retain for browser, move into tied hash? |
Compile Ontology Resources | search_data.txt |
| To be superceded by Xapian search |
Compile Ontology Resources | parent2ids.txt |
| retain for browser, move into tied hash? |
Compile Orthology Resources | disease_search_data.txt |
| Use data for Xapian search; work with Abby |
Compile Orthology Resources | gene_id2omim_ids.txt |
| Keep for disease object |
Compile Orthology Resources | go_id2omim_ids.txt |
|
useful for further paralog data expansion and integration |
Compile Orthology Resources | go_ids2descendants.txt |
|
useful for further paralog data expansion and integration |
Compile Orthology Resources | hs_ensembl_id2omim.txt |
|
disease UI |
Compile Orthology Resources | id2name.txt |
|
useful for further paralog data expansion and integration(?) |
Compile Orthology Resources | name2id.txt |
|
useful for further paralog data expansion and integration(?) |
Compile Orthology Resources | omim_id2all_ortholog_data.txt |
|
use to generate Xapian data; work with Abby |
Compile Orthology Resources | omim_id2disease_desc.txt |
|
use in Disease object model and UI |
Compile Orthology Resources | omim_id2disease_name.txt |
|
use in Disease object model and UI |
Compile Orthology Resources | omim_id2disease_notes.txt |
|
use in Disease object model and UI |
Compile Orthology Resources | omim_id2disease.txt |
|
use in Disease object model and UI |
Compile Orthology Resources | omim_id2gene_name.txt |
|
probably deprecate in updating Disease object model |