Difference between revisions of "Phenote .ace citace upload"

From WormBaseWiki
Jump to navigationJump to search
 
(24 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[http://www.wormbase.org/wiki/index.php/Caltech_documentation back]
+
[http://www.wormbase.org/wiki/index.php/Caltech_documentation back]  
  
= Dumping Phenote into .ace file =
+
= Dumping phenotype app_files into .ace =
 +
Two dumping scripts are required to capture all of the allele phenotype data, both on tazendra:
 +
/home/acedb/karen/WS_upload_scripts/phenotype/use_package.pl*<br>
 +
/home/acedb/karen/WS_upload_scripts/paper_object/get_paper_object.pl*
  
The phenote-> .ace dump script is called ./use_package.pl  and is on tazendra in /home/acedb/work/allele_phenotype.
+
allele_phenotype and paper_object are dumped on a weekly cron job Sundays at 4am<br>
This script takes about 10 minutes and will create two files: ‘allele_phenotype.ace.<date> and err.out.<date>.
+
see /home/acedb/cron/crontab_20140703
NOTE: these files will accumulate over time so make sure to clear them out every once in a while.  
+
0 4 * * sun /home/acedb/karen/WS_upload_scripts/paper_object/get_paper_object.pl
 +
0 4 * * sun cd /home/acedb/karen/WS_upload_scripts/transgene; ./use_package.pl
 +
  0 4 * * sun cd /home/acedb/work/allele_phenotype/; ./use_package.pl
  
: ssh  acedb@tazendra.caltech.edu
 
Pswd
 
: cd /home/acedb/work/allele_phenotype
 
: ./use_package.pl
 
  
New allele_phenotype.ace and err.out files will be created as noted by date at the end of each. Send them to your computer
+
'''/home/acedb/karen/WS_upload_scripts/phenotype/use_package.pl''' :
e.g. Allele_Phene_Dump on scoobydoo:
+
This script creates 5  files:
: scp err.out.20080407 yook@scoobydoo.caltech.edu:/Users/Yook/WS_latest/ace_phenote_dump 
+
*allele_phenotype.ace.<timestamp>
pswd
+
*var_phen.ace
: scp allele_phenotype.ace.20080407 yook@scoobydoo.caltech.edu:/Users/Yook/WS_latest/ace_phenote_dump
+
*mol_phene.ace.<timestamp>
pswd
+
*mol_phene.ace
 +
*err.out.<timestamp>
  
Open err.out to see if any errors were generated during the dump process.
+
var_phen.ace and mol_phen.ace are copies of allele_phenotype.ace and mol_phene.ace.<br>
 +
var_phen.ace and mol_phen.ace are picked up by spica on a cron job that is run every Monday at 8am.
  
Errors in .ace file generated form should be indicated by "ERROR" in the .ace output, or as comments at the top of the form.
+
'''/home/acedb/karen/WS_upload_scripts/paper_object/get_paper_object.pl*''':<br>
  
<left over from CGI .ace dump instructions> Is this still necessary?
+
This script creates a file that will associate phenotype annotated alleles with their respective genes. <br>
Make sure that there are no obsolete phenotype annotations:
+
This is needed so references from phenotype annotations are added to the references on the respective gene page.
- Script for finding obsolete terms is find_obsolete.pl at path listed below on tazendra:
+
And example from this file:  
/home/Postgres/work/citace_upload/allele_phenotype/
+
Paper : WBPaper00000005
 +
Gene WBGene00000898
 +
Allele "WBVar00143949" Inferred_Automatically "Inferred automatically from curated phenotype"
 +
Gene WBGene00004015
 +
Allele "WBVar00000103" Inferred_Automatically "Inferred automatically from curated phenotype"
 +
Gene WBGene00004015
 +
Allele "WBVar00000103" Inferred_Automatically "Inferred automatically from curated phenotype"
  
= Testing Phenote .ace for upload =
+
= Dumping allele_paper connections into .ace file  =
  
Evaluate how the uploaded data impacts WB by comparing object numbers before and after data upload. This means that you need access to the latest WS and have the means to edit the WS to read the .ace dump.  
+
is at&nbsp;:<br>/home/acedb/..../get_paper_object.pl
  
== Test file on spica ==
+
Run it by going to that directory and redirecting output to a file  
  
Send new allele_phenotype.ace to kyook on spica:
+
  ./get_paper_object.pl &gt; file
  : scp <.ace dump files> kyook@spica.caltech.edu:/home3/kyook/alle_phen.ace.latest
 
  
== Test file locally on Acedb ==
+
If there are errors, which is rare,  they will look something like:
  
This step is to test two things:  
+
// ERR
First, that the file is readable.
+
There's an object that has no name and no type, but still has a paper.
Second, that the file does not alter any objects it shouldn’t.  
+
 +
It has postgres database ID 2712 testdb=&gt; SELECT * FROM app_tempname WHERE joinkey = '2712'; joinkey | app_tempname 
 +
  |app_timestamp
 +
----
 +
+--------------+---------------
 +
...
 +
----
 +
+------------------+---------------
 +
...
 +
----
 +
+-----------------+-------------------------------
 +
2712 | WBPaper00002087 | 2006-07-11 11:52:20.208295-07 (1 row) Here's all the data for it when it got migrated
 +
from the  CGI&nbsp;
 +
 +
 
 +
= Testing Phenote .ace for upload  =
 +
 
 +
Evaluate how the uploaded data impacts WB by comparing object numbers before and after data upload. This means that you need access to the latest WS and have the means to edit the WS to read the .ace dump.<br>
 +
 
 +
== Test file locally on Acedb  ==
 +
 
 +
This step is to test two things: First, that the file is readable. Second, that the file does not alter any objects it shouldn’t.  
  
 
Launch local acedb:  
 
Launch local acedb:  
 +
 
  $ cd Desktop/acedb
 
  $ cd Desktop/acedb
  $ ./xace /Users/Yook/WS_latest/WS188
+
  $ ./xace /.../WSXXX/acedb
 +
 
 +
Comparing data builds: From the latest build, record
 +
 
 +
#Number '''strain objects'''
 +
#Number '''life stage objects'''
 +
#Number '''anatomy terms'''
 +
#Number '''variation''''''-phenotype connection'''
 +
 
 +
use either AQL query:
 +
 
 +
select all class variation where exists_tag -&gt;phenotype select p, p-&gt; variation from p in class
 +
phenotype where exists_tag  p-&gt;variation
  
Comparing data builds:
 
From the latest build, record
 
#Number '''strain objects'''
 
#Number '''life stage objects'''
 
#Number '''anatomy terms'''
 
#Number '''allele-phenotype connection''' 
 
use either AQL query
 
select all class variation where exists_tag ->phenotype
 
select a, a-> variation from a in class phenotype where exists_tag a ->variation
 
 
:5. Number '''variations that are alleles:'''
 
:5. Number '''variations that are alleles:'''
use WQL
+
 
 +
use WQL  
 +
 
 
  find variation variation_type=allele
 
  find variation variation_type=allele
or AQL
+
 
  select g from g in class variation where exists_tag g-> allele  
+
or AQL  
 +
 
 +
  select a from a in class variation where exists_tag a-&gt;allele  
 +
 
 
:6. Number '''alleles with a phenotype'''
 
:6. Number '''alleles with a phenotype'''
Find all alleles with a phenotype
 
select a from a in class variation where exists_tag->phenotype
 
  
Checking that the .ace dump file is readable:  
+
Find all alleles with a phenotype
Load in .ace dump
+
 
Hit ‘Edit..‘ button
+
select v from v in class variation where exists_tag v-&gt;phenotype
Choose ‘Read .ace file’
+
 
Accept change in write priviledges
+
Check that the .ace dump file is readable:  
Select ‘Open ace file’
+
 
Find and select file to open
+
to load in .ace dump  
Select ‘Read all’
+
 
 +
Select ‘Edit..‘ ‘Read .ace file’, &nbsp;Accept change in write privileges, &nbsp;Select ‘Open ace file’, &nbsp;Find and select file to open, &nbsp;Select ‘Read all’  
 +
 
 +
If the .ace file is okay, then 100% of the lines will have been read in. This will be noted in the second line called ‘Line:’. If the dump produced a bad file, the read in will stop at the point where the problem occurred. You can go to that line in the .ace file and check it out.
 +
 
 +
Check that the file does not alter any objects it shouldn’t:
 +
 
 +
Once .ace dump is loaded in to acedb, &nbsp;redo counts for all objects as above. Compare object numbers between latest database and new .ace file to make sure data in the dump looks reasonable, i.e., no lost data or inflated numbers.
 +
 
 +
= Uploading .ace for Wen into Citace  =
  
If the .ace file is okay, then 100% of the lines will have been read in.  This will be noted in the second line called ‘Line:’. If the dump produced a bad file, the read in will stop at the point where the problem occurred.  You can go to that line in the .ace file and check it out. 
+
Deposit .ace for Wen in citace:  
  
Checking that the file does not alter any objects it shouldn’t:
+
scp var_phen.ace xxxx@xxxx.caltech.edu:/.../Data
Once .ace dump loaded in redo counts for all objects as above.  
 
Compare object numbers between latest database and new .ace file to make sure data in the dump looks reasonable, i.e., no lost data or inflated numbers.
 
  
= Uploading .ace for Wen into Citace =
 
  
Deposit .ace for Wen in citace:
+
[[Category:Phenotype Curation]]
scp XXXXX citace@altair:~/Data_for_citace/Data_from_<YOU> /home/citace/Data_for_citace/Data_from_<YOU>/ <allele_phenotype_dump_WSxxx.ace>
 

Latest revision as of 19:47, 17 May 2018

back

Dumping phenotype app_files into .ace

Two dumping scripts are required to capture all of the allele phenotype data, both on tazendra:

/home/acedb/karen/WS_upload_scripts/phenotype/use_package.pl*
/home/acedb/karen/WS_upload_scripts/paper_object/get_paper_object.pl*

allele_phenotype and paper_object are dumped on a weekly cron job Sundays at 4am
see /home/acedb/cron/crontab_20140703

0 4 * * sun /home/acedb/karen/WS_upload_scripts/paper_object/get_paper_object.pl
0 4 * * sun cd /home/acedb/karen/WS_upload_scripts/transgene; ./use_package.pl
0 4 * * sun cd /home/acedb/work/allele_phenotype/; ./use_package.pl


/home/acedb/karen/WS_upload_scripts/phenotype/use_package.pl : This script creates 5 files:

  • allele_phenotype.ace.<timestamp>
  • var_phen.ace
  • mol_phene.ace.<timestamp>
  • mol_phene.ace
  • err.out.<timestamp>

var_phen.ace and mol_phen.ace are copies of allele_phenotype.ace and mol_phene.ace.
var_phen.ace and mol_phen.ace are picked up by spica on a cron job that is run every Monday at 8am.

/home/acedb/karen/WS_upload_scripts/paper_object/get_paper_object.pl*:

This script creates a file that will associate phenotype annotated alleles with their respective genes.
This is needed so references from phenotype annotations are added to the references on the respective gene page. And example from this file:

Paper : WBPaper00000005
Gene	WBGene00000898
Allele	"WBVar00143949"	Inferred_Automatically	"Inferred automatically from curated phenotype"
Gene	WBGene00004015
Allele	"WBVar00000103"	Inferred_Automatically	"Inferred automatically from curated phenotype"
Gene	WBGene00004015
Allele	"WBVar00000103"	Inferred_Automatically	"Inferred automatically from curated phenotype"

Dumping allele_paper connections into .ace file

is at :
/home/acedb/..../get_paper_object.pl

Run it by going to that directory and redirecting output to a file

./get_paper_object.pl > file 

If there are errors, which is rare, they will look something like:

// ERR 
There's an object that has no name and no type, but still has a paper.

It has postgres database ID 2712 testdb=> SELECT * FROM app_tempname WHERE joinkey = '2712'; joinkey | app_tempname  
 |app_timestamp 
----
+--------------+---------------
...
----
+------------------+---------------
...
----
+-----------------+-------------------------------
2712 | WBPaper00002087 | 2006-07-11 11:52:20.208295-07 (1 row) Here's all the data for it when it got migrated 
from the  CGI  

Testing Phenote .ace for upload

Evaluate how the uploaded data impacts WB by comparing object numbers before and after data upload. This means that you need access to the latest WS and have the means to edit the WS to read the .ace dump.

Test file locally on Acedb

This step is to test two things: First, that the file is readable. Second, that the file does not alter any objects it shouldn’t.

Launch local acedb:

$ cd Desktop/acedb
$ ./xace /.../WSXXX/acedb

Comparing data builds: From the latest build, record

  1. Number strain objects
  2. Number life stage objects
  3. Number anatomy terms
  4. Number variation'-phenotype connection'
use either AQL query:
select all class variation where exists_tag ->phenotype select p, p-> variation from p in class 
phenotype where exists_tag  p->variation 
5. Number variations that are alleles:

use WQL

find variation variation_type=allele

or AQL

select a from a in class variation where exists_tag a->allele 
6. Number alleles with a phenotype

Find all alleles with a phenotype

select v from v in class variation where exists_tag v->phenotype

Check that the .ace dump file is readable:

to load in .ace dump

Select ‘Edit..‘ ‘Read .ace file’,  Accept change in write privileges,  Select ‘Open ace file’,  Find and select file to open,  Select ‘Read all’

If the .ace file is okay, then 100% of the lines will have been read in. This will be noted in the second line called ‘Line:’. If the dump produced a bad file, the read in will stop at the point where the problem occurred. You can go to that line in the .ace file and check it out.

Check that the file does not alter any objects it shouldn’t:

Once .ace dump is loaded in to acedb,  redo counts for all objects as above. Compare object numbers between latest database and new .ace file to make sure data in the dump looks reasonable, i.e., no lost data or inflated numbers.

Uploading .ace for Wen into Citace

Deposit .ace for Wen in citace:

scp var_phen.ace xxxx@xxxx.caltech.edu:/.../Data