Difference between revisions of "Documentation for workflow and scripts"

From WormBaseWiki
Jump to navigationJump to search
m
m
Line 36: Line 36:
 
Then the creation of the concise descriptions begin:
 
Then the creation of the concise descriptions begin:
  
  $ ./create_GO_sentences_elegans_species_parallel_all.pl && ./create_sentence_multiple_orthologs_species_all_parallel_all.pl &&  
+
  $ ./create_GO_sentences_elegans_species_parallel_all.pl  
./create_GO_sentences_species_parallel_all.pl && ./concatenate_sentences_species_parallel_all.pl && ./generate_OA_concise_descriptions_parallel_all.pl
+
  && ./create_sentence_multiple_orthologs_species_all_parallel_all.pl && ./create_GO_sentences_species_parallel_all.pl  
 +
  && ./concatenate_sentences_species_parallel_all.pl && ./generate_OA_concise_descriptions_parallel_all.pl
  
 
Finally, a report is written to detail the number of concise descriptions for each species for a given production release:
 
Finally, a report is written to detail the number of concise descriptions for each species for a given production release:
  
 
  $ ./total_description_count.pl WS250
 
  $ ./total_description_count.pl WS250

Revision as of 22:32, 12 June 2015

The following files in the cgi-bin or wherever the scripts for concise descriptions are listed must be written:

db_ip.txt is the IP address of the SQL database that holds much of the daily updated information for Wormbase.
html.txt  is the location of the docroot or /var/www directory for the output.
parallel_path.txt is the path to where GNU Parallel is installed.
cgi.txt is the location of the scripts.

Make sure that the files in the docroot or /var/www directory in which the data are stored report the current production release, release of sources and the list of Wormbase supported species:

 production_release.txt will hold the production release for the output of the concise descriptions.
 release.txt            holds the release information for the sources.
 species.txt            lists the tab separated values of the species abbreviation, project name, 
                        full name and gene prefix for each species:
                        c_briggsae	PRJNA10731	Caenorhabditis briggsae	Cbr

The first step is to go to the directory that includes the location of the scripts.

Then create the directories needed by the scripts for a given production release:

$ ./create_release_directories_parallel.pl WS250

There are input files that must be downloaded and formatted for input; the following scripts must be run:

$ ./biomart_query.pl
$ ./download_gene_lists_elegans.pl
$ ./download_gene_associations_parallel_all.pl
$ ./go_terms_only.pl
$ ./get_alt_id_terms_only.pl
$ ./go_obo_to_go_ace.pl
$ ./list_dead_genes.pl
$ ./create_curated_gene_list.pl
$ ./list_uncurated_genes.pl
$ ./create_gene_list.pl
$ ./parse_gene_lists_elegans.pl
$ ./parse_orthologs_all_parallel.pl

Then the creation of the concise descriptions begin:

$ ./create_GO_sentences_elegans_species_parallel_all.pl 
  && ./create_sentence_multiple_orthologs_species_all_parallel_all.pl && ./create_GO_sentences_species_parallel_all.pl 
  && ./concatenate_sentences_species_parallel_all.pl && ./generate_OA_concise_descriptions_parallel_all.pl

Finally, a report is written to detail the number of concise descriptions for each species for a given production release:

$ ./total_description_count.pl WS250