Difference between revisions of "SOP for GO citace and GO consortium data uploads"

From WormBaseWiki
Jump to navigationJump to search
(Blanked the page)
 
Line 1: Line 1:
====Generating a .ace file for GO annotations to Genes (Nov 2013 onwards)====
 
All scripts and files: /home/acedb/ranjana/citace_upload/go_curation
 
  
--run the wrapper.pl script, this generates go.ace.<date>, at /go_dumper_files, which are all the annotations that are left in the OA
 
(RNA genes, uncloned genes)
 
 
--the count_stuff_for_ace.pl script at /go_dumper_files, generates numbers for the go.ace.<date> file.
 
 
--from /ptgo_to_ace, run the gpToAce.pl script, this generates gp_association.ace which are all the annotations from Protein2GO
 
 
--scp go.ace.<date> and gp_association.ace to maya.caltech.edu and rename them as go_oa_WSXXX.ace and gp_association_WSXXX.ace respectively.
 
 
--Test syntax of files and #of objects in local citace mirror
 
 
--scp files to citpub@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Ranjana/.
 
 
====Generating a .ace for the GO terms (ontology)====
 
 
--A new GO terms ontology file is generated every build: at /home/acedb/ranjana/citace_upload/go_curation/go_obo2ace, run the go_obo_to_go_ace.pl script
 
 
--scp file to Maya, rename with upload number, as go_terms_WSXXX.ace
 
 
====Obo file of GO terms for citace upload====
 
--At /home/citace/Data_for_Ontology/ at citpub@spica.caltech.edu, use 'wget' to get gene_ontology_edit.obo file from:
 
 
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.obo.
 
 
--Rename file as gene_ontology.WS231.obo.
 
 
====List of files for a citace upload of GO data====
 
 
These files are deposited to citpub@spica in /home/citpub/Data_for_citace/Data_from_Ranjana/
 
 
1. go_oa_WSXXX.ace (manual annotations in the OA)
 
 
2. gp_association_WSXXX.ace (manual annotations from Protein2GO)
 
 
3. go_terms_WSXXX.ace (GO ontology)
 
 
4. variation2goterm_VarID.ace (file where allele names have been converted to WBVarIDs by Wen, use until this data is read into Postgres/OA).
 
 
5. phenotype2go_mappings.ace (consolidated phenotype2go mappings for any given build).
 
 
submitted to:/home/citpub/Data_for_Ontology/:
 
 
6. gene_ontology.WSXXX.obo.
 
 
No longer submitted:
 
 
WBPaper00038491_genes.ace added genes to paper connection for Daniel Shaye, these genes were added to the paper editor, so this file is no longer manually being put into citace.
 
 
====Numbers for citace upload====
 
As reported by an local empty citace mirror:
 
 
 
 
 
====Generating a gene association file (since Nov 2013)====
 
 
====Uploading the gene association file to the GO consortium repository====
 
 
====Old SOP for generating a .ace file====
 
 
On Tazendra, acedb account:
 
 
--run the ./wrapper.pl script at /home/acedb/ranjana/citace_upload/go_curation/
 
 
--./wrapper.pl dumps both go.ace and go.go files under /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ with dates appended
 
 
-- go.go.20090731 and go.ace.20090731.091726 files created under /go_dumper_files
 
 
--Run the check_go_ace.pl script as './check_go_ace.pl filename' ./check_go_ace.pl (NOTE: THIS SCRIPT NO LONGER RUN)
 
then strips out errors that don't have to do with the Gene header, and puts all errors in the error_files/go.err.time (if it's in the go.ace.time format it replaces the ace part with err)
 
 
--As of now the script is removing only the erroneous line but not the curator_confirmed line associated and directly under this line, which needs to be removed manually. Need to think about this.
 
 
--Run the count_stuff_for_ace.pl on the script to get the numbers Note***Worked with JC to modify check_go_ace.pl, actually this script is no longer relevant and could be skipped, since we are using the OA.
 
 
--scp file to maya.caltech.edu and rename file in format:
 
032107_WS174_go_dump.ace
 
 
--Manually remove these annotations that are actually 'NOT'annotations of:
 
 
mtm-9 WBGene00003479 GO:0004438
 
 
vha-2 WBGene00006911 GO:0009790--looks like annotation was removed manually, no longer in dump
 
 
vha-3 WBGene00006912 GO:0009790--looks like annotation was removed manually, no longer in dump
 
 
hsp-60 WBGene00002025 GO:0009408 (added from WS194 upload)
 
 
hsp-12.3 WBGene00002012 GO:0051082 (added from WS202 upload)
 
 
hsp-12.6 WBGene00002013 GO:0051082 and GO:0006950
 
 
--Test file syntax and #of objects in local citace mirror on Juno:
 
 
Read in file for syntax errors
 
 
Count #of WBGenes, Papers, WBPersons before and after loading ace file
 
 
--scp file to citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Ranjana/.
 
 
The following files are submitted to the citace account on citace@spica.caltech.edu every build:
 
 
To: /home/citace/Data_for_citace/Data_from_Ranjana/
 
 
1. date_WSXXX_go_dump.ace (dumped from postgres, from the manual curation via Phenote)
 
 
2. variation2goterm_VarID.ace. This is the file where allele names have been converted to WBVarIDs by Wen. Use this file until this data is read into Postgres.
 
 
3. phenotype2go_mappings.ace (consolidated phenotype2go mappings for any given build).
 
 
4. A new GO terms ontology file is generated at /home/acedb/ranjana/citace_upload/go_curation/go_obo2ace using the go_obo_to_go_ace.pl script, rename with upload number, eg. go_terms_WS240.ace: (We used to submit a WSXXXGOterms.ace file that Wen dumped, no longer used)
 
 
All of the above files are submittd to: /home/citace/Data_for_Ontology/ at citace@spica.caltech.edu
 
 
NOTE:These genes were added to the paper editor, so this file is no longer manually being put into citace.
 
 
5. WBPaper00038491_genes.ace added genes to paper connection for Daniel Shaye
 
 
Change directory to: Data_for_Ontology/, under /home/citace/.
 
 
Here use 'wget' to get gene_ontology_edit.obo file from
 
 
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.obo.
 
 
Rename file in the format: gene_ontology.WS231.obo.
 
 
====Old SOP for generating a gene association file====
 
 
In the acedb user account on Tazendra at:/home/acedb/ranjana/GO:
 
--Use ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS211/ONTOLOGY/gene_association.WS211.wb.ce
 
 
--use'grep IEA gene_association.WSXXX.wb.ce>gene_association.wb.electronic to separate the IEAs.
 
 
--grep WBPhenotype gene_association.WSXXX.wb.ce > gene_association.wb.rnai2go(to get i.e both Erich's earlier RNAi2GO ones and the new associations based on allele phenotypes that went into WormBase WS186).
 
 
--copy the right go.go.<date> file from /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ to this directory,change name to gene_association.wb.manual.
 
 
--new GOA elegans file, from 04.02.12, for external annots (use 'wget ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/9.C_elegans.goa')
 
 
--Run the ./wrapper.pl script
 
Output will include the various error types
 
 
--Run ./strip_errors_and_concatenate.pl
 
 
Scp the generated gene association file to a local machine for post-processing and upload to the GOC In the tmp directory on Maya: --scp file to Maya
 
 
--removed 'NOT' annotations from mtm-9, vha-2, vha-3, hsp-60, hsp-12.3, hsp-12.6. (We do not take out NOT annotations anymore)
 
 
--removed header from the middle of concatenated file in two places (on top of UniProt file too, search for 'gaf-version') and placed on top of file (correct minor mistake in header--space after the $ on one of the lines)
 
 
--And move the following header from the middle of file to the top of file:
 
 
!Version: $Revision: $
 
 
!Organism: Caenorhabditis elegans
 
 
!date: $Date: $
 
 
!From: WormBase
 
 
--Add these two lines at the bottom of header:
 
 
!DataBase_Project_Name: WormBase WS215/WS216
 
 
!gaf-version: 2.0
 
 
--Remove the header 'gaf 2.0', from the top of the UniProt file
 
 
--gzip file
 
 
--Copy file to the tmp directory
 
 
Use SVN commands to upload to the GO, also update README file every upload.
 

Latest revision as of 20:12, 3 February 2014