SOP for generating GO files for citace and GO consortium uploads
- 1 Generating a .ace file for GO annotations to Genes (Nov 2013 onwards)
- 2 Generating a .ace for the GO terms (ontology)
- 3 Obo file of GO terms for citace upload
- 4 List of files for a citace upload of GO data
- 5 Numbers for citace upload
- 6 Generating a gene association file (since Nov 2013)
- 7 Uploading the gene association file to the GO consortium repository
- 8 Old SOP for generating a .ace file
- 9 Old SOP for generating a gene association file
Generating a .ace file for GO annotations to Genes (Nov 2013 onwards)
All scripts and files: /home/acedb/ranjana/citace_upload/go_curation
--run the wrapper.pl script, this generates go.ace.<date>, at /go_dumper_files, which are all the annotations that are left in the OA (RNA genes, uncloned genes)
--the count_stuff_for_ace.pl script at /go_dumper_files, generates numbers for the go.ace.<date> file.
--from /ptgo_to_ace, run the gpToAce.pl script, this generates gp_association.ace which are all the annotations from Protein2GO
--scp go.ace.<date> and gp_association.ace to maya.caltech.edu and rename them as go_oa_WSXXX.ace and gp_association_WSXXX.ace respectively.
--Test syntax of files and #of objects in local citace mirror
--scp files to email@example.com:/home/citace/Data_for_citace/Data_from_Ranjana/.
Generating a .ace for the GO terms (ontology)
--A new GO terms ontology file is generated every build: at /home/acedb/ranjana/citace_upload/go_curation/go_obo2ace, run the go_obo_to_go_ace.pl script
--scp file to Maya, rename with upload number, as go_terms_WSXXX.ace
Obo file of GO terms for citace upload
--At /home/citace/Data_for_Ontology/ at firstname.lastname@example.org, use 'wget' to get gene_ontology_edit.obo file from:
--Rename file as gene_ontology.WS231.obo.
List of files for a citace upload of GO data
These files are deposited to citpub@spica in /home/citpub/Data_for_citace/Data_from_Ranjana/
1. go_oa_WSXXX.ace (manual annotations in the OA)
2. gp_association_WSXXX.ace (manual annotations from Protein2GO)
3. go_terms_WSXXX.ace (GO ontology)
4. variation2goterm_VarID.ace (file where allele names have been converted to WBVarIDs by Wen, use until this data is read into Postgres/OA).
5. phenotype2go_mappings.ace (consolidated phenotype2go mappings for any given build).
No longer submitted:
WBPaper00038491_genes.ace added genes to paper connection for Daniel Shaye, these genes were added to the paper editor, so this file is no longer manually being put into citace.
Numbers for citace upload
As reported by an local empty citace mirror:
Generating a gene association file (since Nov 2013)
Uploading the gene association file to the GO consortium repository
Old SOP for generating a .ace file
On Tazendra, acedb account:
--run the ./wrapper.pl script at /home/acedb/ranjana/citace_upload/go_curation/
--./wrapper.pl dumps both go.ace and go.go files under /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ with dates appended
-- go.go.20090731 and go.ace.20090731.091726 files created under /go_dumper_files
--Run the check_go_ace.pl script as './check_go_ace.pl filename' ./check_go_ace.pl (NOTE: THIS SCRIPT NO LONGER RUN) then strips out errors that don't have to do with the Gene header, and puts all errors in the error_files/go.err.time (if it's in the go.ace.time format it replaces the ace part with err)
--As of now the script is removing only the erroneous line but not the curator_confirmed line associated and directly under this line, which needs to be removed manually. Need to think about this.
--Run the count_stuff_for_ace.pl on the script to get the numbers Note***Worked with JC to modify check_go_ace.pl, actually this script is no longer relevant and could be skipped, since we are using the OA.
--scp file to maya.caltech.edu and rename file in format: 032107_WS174_go_dump.ace
--Manually remove these annotations that are actually 'NOT'annotations of:
mtm-9 WBGene00003479 GO:0004438
vha-2 WBGene00006911 GO:0009790--looks like annotation was removed manually, no longer in dump
vha-3 WBGene00006912 GO:0009790--looks like annotation was removed manually, no longer in dump
hsp-60 WBGene00002025 GO:0009408 (added from WS194 upload)
hsp-12.3 WBGene00002012 GO:0051082 (added from WS202 upload)
hsp-12.6 WBGene00002013 GO:0051082 and GO:0006950
--Test file syntax and #of objects in local citace mirror on Juno:
Read in file for syntax errors
Count #of WBGenes, Papers, WBPersons before and after loading ace file
--scp file to email@example.com:/home/citace/Data_for_citace/Data_from_Ranjana/.
The following files are submitted to the citace account on firstname.lastname@example.org every build:
1. date_WSXXX_go_dump.ace (dumped from postgres, from the manual curation via Phenote)
2. variation2goterm_VarID.ace. This is the file where allele names have been converted to WBVarIDs by Wen. Use this file until this data is read into Postgres.
3. phenotype2go_mappings.ace (consolidated phenotype2go mappings for any given build).
4. A new GO terms ontology file is generated at /home/acedb/ranjana/citace_upload/go_curation/go_obo2ace using the go_obo_to_go_ace.pl script, rename with upload number, eg. go_terms_WS240.ace: (We used to submit a WSXXXGOterms.ace file that Wen dumped, no longer used)
All of the above files are submittd to: /home/citace/Data_for_Ontology/ at email@example.com
NOTE:These genes were added to the paper editor, so this file is no longer manually being put into citace.
5. WBPaper00038491_genes.ace added genes to paper connection for Daniel Shaye
Change directory to: Data_for_Ontology/, under /home/citace/.
Here use 'wget' to get gene_ontology_edit.obo file from
Rename file in the format: gene_ontology.WS231.obo.
Old SOP for generating a gene association file
In the acedb user account on Tazendra at:/home/acedb/ranjana/GO: --Use ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS211/ONTOLOGY/gene_association.WS211.wb.ce
--use'grep IEA gene_association.WSXXX.wb.ce>gene_association.wb.electronic to separate the IEAs.
--grep WBPhenotype gene_association.WSXXX.wb.ce > gene_association.wb.rnai2go(to get i.e both Erich's earlier RNAi2GO ones and the new associations based on allele phenotypes that went into WormBase WS186).
--copy the right go.go.<date> file from /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ to this directory,change name to gene_association.wb.manual.
--new GOA elegans file, from 04.02.12, for external annots (use 'wget ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/9.C_elegans.goa')
--Run the ./wrapper.pl script Output will include the various error types
Scp the generated gene association file to a local machine for post-processing and upload to the GOC In the tmp directory on Maya: --scp file to Maya
--removed 'NOT' annotations from mtm-9, vha-2, vha-3, hsp-60, hsp-12.3, hsp-12.6. (We do not take out NOT annotations anymore)
--removed header from the middle of concatenated file in two places (on top of UniProt file too, search for 'gaf-version') and placed on top of file (correct minor mistake in header--space after the $ on one of the lines)
--And move the following header from the middle of file to the top of file:
!Version: $Revision: $
!Organism: Caenorhabditis elegans
!date: $Date: $
--Add these two lines at the bottom of header:
!DataBase_Project_Name: WormBase WS215/WS216
--Remove the header 'gaf 2.0', from the top of the UniProt file
--Copy file to the tmp directory
Use SVN commands to upload to the GO, also update README file every upload.