Importing Protein Structure Data
Wormbase contains links to genomics centers which perform protein structure determination. The records that point to worm proteins are acquired from TargetDB (http://targetdb.pdb.org/) and PDB (http://www.rcsb.org/). The protein sequences in these records are mapped to Wormbase proteins (Wormpep) and they are displayed in the context of the Wormbase gene page.
The following pipeline script and the config file performs acquisition of the data, mapping of the data to Wormpep and conversions of the results to Ace format.
$WORMBASE/util/import_export/compile_3d_data/ compile_3d_data.pl compile_3d_data.cfg
To update the structure data pointers in Wormbase:
1) Run compile_3d_data.pl.
Execute with no params for usage info. The script expects an empty directory to be specified. Files for all steps of the process are generated in the specified directory. The directory also contains a log and a status file. The status file is used internally to continue the pipeline from where it left off in case of a failure. The script generates an ace file.
2) Load the Ace file into cshace.
- Copy over new ace file to archive
brie3:/usr/local/acedb/datasets/structure_data/ (Currently this directory is compressed into structure_data.tgz)
- Remove existing structure data objects.
tace dumpdb ... acedb> find Structure_data // Found 14876 objects in this class // 14876 Active Objects acedb> find Structure_data "Wbstructure*" // Found 14876 objects in this class // 14876 Active Objects acedb> kill // Do you really want to destroy these 14876 objects (y or n) y // 0 Active Objects acedb> save
- Load new ace file
acedb> parse /usr/local/acedb/datasets/structure_data/structure_<date_time>.ace // Parsing file /usr/local/acedb/datasets/structure_data/structure_<date_time>.ace // objects processed: 30474 found, 30474 parsed ok, 0 parse failed // 30474 Active Objects acedb> save // 30474 Active Objects acedb> quit tace dumpdb
- Make a dump of the new cshace
make (this creates a cshl_dump_YYYY-MM-DD.tar.gz in dumpdb/)
- Upload the cshace dump to Sanger (see ftp_to_sanger.pl)
- compile_3d_data.pl has an option that the earlier MERGED.ace can be specified as a command line option (idx_ace). When this is provided, the pipeline preserves previous WBStructure ids.
- Use "structure_<yymmdd_hhmm>.ace" naming convention for ace files generated by compile_3d_data.pl.
- Before loading new data into cshace, update the wspec/models.wrm file to the current models file.