|
|
Line 1: |
Line 1: |
− | ====Generating a .ace file for GO annotations to Genes (Nov 2013 onwards)====
| |
− | All scripts and files: /home/acedb/ranjana/citace_upload/go_curation
| |
| | | |
− | --run the wrapper.pl script, this generates go.ace.<date>, at /go_dumper_files, which are all the annotations that are left in the OA
| |
− | (RNA genes, uncloned genes)
| |
− |
| |
− | --the count_stuff_for_ace.pl script at /go_dumper_files, generates numbers for the go.ace.<date> file.
| |
− |
| |
− | --from /ptgo_to_ace, run the gpToAce.pl script, this generates gp_association.ace which are all the annotations from Protein2GO
| |
− |
| |
− | --scp go.ace.<date> and gp_association.ace to maya.caltech.edu and rename them as go_oa_WSXXX.ace and gp_association_WSXXX.ace respectively.
| |
− |
| |
− | --Test syntax of files and #of objects in local citace mirror
| |
− |
| |
− | --scp files to citpub@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Ranjana/.
| |
− |
| |
− | ====Generating a .ace for the GO terms (ontology)====
| |
− |
| |
− | --A new GO terms ontology file is generated every build: at /home/acedb/ranjana/citace_upload/go_curation/go_obo2ace, run the go_obo_to_go_ace.pl script
| |
− |
| |
− | --scp file to Maya, rename with upload number, as go_terms_WSXXX.ace
| |
− |
| |
− | ====Obo file of GO terms for citace upload====
| |
− | --At /home/citace/Data_for_Ontology/ at citpub@spica.caltech.edu, use 'wget' to get gene_ontology_edit.obo file from:
| |
− |
| |
− | http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.obo.
| |
− |
| |
− | --Rename file as gene_ontology.WS231.obo.
| |
− |
| |
− | ====List of files for a citace upload of GO data====
| |
− |
| |
− | These files are deposited to citpub@spica in /home/citpub/Data_for_citace/Data_from_Ranjana/
| |
− |
| |
− | 1. go_oa_WSXXX.ace (manual annotations in the OA)
| |
− |
| |
− | 2. gp_association_WSXXX.ace (manual annotations from Protein2GO)
| |
− |
| |
− | 3. go_terms_WSXXX.ace (GO ontology)
| |
− |
| |
− | 4. variation2goterm_VarID.ace (file where allele names have been converted to WBVarIDs by Wen, use until this data is read into Postgres/OA).
| |
− |
| |
− | 5. phenotype2go_mappings.ace (consolidated phenotype2go mappings for any given build).
| |
− |
| |
− | submitted to:/home/citpub/Data_for_Ontology/:
| |
− |
| |
− | 6. gene_ontology.WSXXX.obo.
| |
− |
| |
− | No longer submitted:
| |
− |
| |
− | WBPaper00038491_genes.ace added genes to paper connection for Daniel Shaye, these genes were added to the paper editor, so this file is no longer manually being put into citace.
| |
− |
| |
− | ====Numbers for citace upload====
| |
− | As reported by an local empty citace mirror:
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | ====Generating a gene association file (since Nov 2013)====
| |
− |
| |
− | ====Uploading the gene association file to the GO consortium repository====
| |
− |
| |
− | ====Old SOP for generating a .ace file====
| |
− |
| |
− | On Tazendra, acedb account:
| |
− |
| |
− | --run the ./wrapper.pl script at /home/acedb/ranjana/citace_upload/go_curation/
| |
− |
| |
− | --./wrapper.pl dumps both go.ace and go.go files under /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ with dates appended
| |
− |
| |
− | -- go.go.20090731 and go.ace.20090731.091726 files created under /go_dumper_files
| |
− |
| |
− | --Run the check_go_ace.pl script as './check_go_ace.pl filename' ./check_go_ace.pl (NOTE: THIS SCRIPT NO LONGER RUN)
| |
− | then strips out errors that don't have to do with the Gene header, and puts all errors in the error_files/go.err.time (if it's in the go.ace.time format it replaces the ace part with err)
| |
− |
| |
− | --As of now the script is removing only the erroneous line but not the curator_confirmed line associated and directly under this line, which needs to be removed manually. Need to think about this.
| |
− |
| |
− | --Run the count_stuff_for_ace.pl on the script to get the numbers Note***Worked with JC to modify check_go_ace.pl, actually this script is no longer relevant and could be skipped, since we are using the OA.
| |
− |
| |
− | --scp file to maya.caltech.edu and rename file in format:
| |
− | 032107_WS174_go_dump.ace
| |
− |
| |
− | --Manually remove these annotations that are actually 'NOT'annotations of:
| |
− |
| |
− | mtm-9 WBGene00003479 GO:0004438
| |
− |
| |
− | vha-2 WBGene00006911 GO:0009790--looks like annotation was removed manually, no longer in dump
| |
− |
| |
− | vha-3 WBGene00006912 GO:0009790--looks like annotation was removed manually, no longer in dump
| |
− |
| |
− | hsp-60 WBGene00002025 GO:0009408 (added from WS194 upload)
| |
− |
| |
− | hsp-12.3 WBGene00002012 GO:0051082 (added from WS202 upload)
| |
− |
| |
− | hsp-12.6 WBGene00002013 GO:0051082 and GO:0006950
| |
− |
| |
− | --Test file syntax and #of objects in local citace mirror on Juno:
| |
− |
| |
− | Read in file for syntax errors
| |
− |
| |
− | Count #of WBGenes, Papers, WBPersons before and after loading ace file
| |
− |
| |
− | --scp file to citace@spica.caltech.edu:/home/citace/Data_for_citace/Data_from_Ranjana/.
| |
− |
| |
− | The following files are submitted to the citace account on citace@spica.caltech.edu every build:
| |
− |
| |
− | To: /home/citace/Data_for_citace/Data_from_Ranjana/
| |
− |
| |
− | 1. date_WSXXX_go_dump.ace (dumped from postgres, from the manual curation via Phenote)
| |
− |
| |
− | 2. variation2goterm_VarID.ace. This is the file where allele names have been converted to WBVarIDs by Wen. Use this file until this data is read into Postgres.
| |
− |
| |
− | 3. phenotype2go_mappings.ace (consolidated phenotype2go mappings for any given build).
| |
− |
| |
− | 4. A new GO terms ontology file is generated at /home/acedb/ranjana/citace_upload/go_curation/go_obo2ace using the go_obo_to_go_ace.pl script, rename with upload number, eg. go_terms_WS240.ace: (We used to submit a WSXXXGOterms.ace file that Wen dumped, no longer used)
| |
− |
| |
− | All of the above files are submittd to: /home/citace/Data_for_Ontology/ at citace@spica.caltech.edu
| |
− |
| |
− | NOTE:These genes were added to the paper editor, so this file is no longer manually being put into citace.
| |
− |
| |
− | 5. WBPaper00038491_genes.ace added genes to paper connection for Daniel Shaye
| |
− |
| |
− | Change directory to: Data_for_Ontology/, under /home/citace/.
| |
− |
| |
− | Here use 'wget' to get gene_ontology_edit.obo file from
| |
− |
| |
− | http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.obo.
| |
− |
| |
− | Rename file in the format: gene_ontology.WS231.obo.
| |
− |
| |
− | ====Old SOP for generating a gene association file====
| |
− |
| |
− | In the acedb user account on Tazendra at:/home/acedb/ranjana/GO:
| |
− | --Use ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS211/ONTOLOGY/gene_association.WS211.wb.ce
| |
− |
| |
− | --use'grep IEA gene_association.WSXXX.wb.ce>gene_association.wb.electronic to separate the IEAs.
| |
− |
| |
− | --grep WBPhenotype gene_association.WSXXX.wb.ce > gene_association.wb.rnai2go(to get i.e both Erich's earlier RNAi2GO ones and the new associations based on allele phenotypes that went into WormBase WS186).
| |
− |
| |
− | --copy the right go.go.<date> file from /home/acedb/ranjana/citace_upload/go_curation/go_dumper_files/ to this directory,change name to gene_association.wb.manual.
| |
− |
| |
− | --new GOA elegans file, from 04.02.12, for external annots (use 'wget ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/9.C_elegans.goa')
| |
− |
| |
− | --Run the ./wrapper.pl script
| |
− | Output will include the various error types
| |
− |
| |
− | --Run ./strip_errors_and_concatenate.pl
| |
− |
| |
− | Scp the generated gene association file to a local machine for post-processing and upload to the GOC In the tmp directory on Maya: --scp file to Maya
| |
− |
| |
− | --removed 'NOT' annotations from mtm-9, vha-2, vha-3, hsp-60, hsp-12.3, hsp-12.6. (We do not take out NOT annotations anymore)
| |
− |
| |
− | --removed header from the middle of concatenated file in two places (on top of UniProt file too, search for 'gaf-version') and placed on top of file (correct minor mistake in header--space after the $ on one of the lines)
| |
− |
| |
− | --And move the following header from the middle of file to the top of file:
| |
− |
| |
− | !Version: $Revision: $
| |
− |
| |
− | !Organism: Caenorhabditis elegans
| |
− |
| |
− | !date: $Date: $
| |
− |
| |
− | !From: WormBase
| |
− |
| |
− | --Add these two lines at the bottom of header:
| |
− |
| |
− | !DataBase_Project_Name: WormBase WS215/WS216
| |
− |
| |
− | !gaf-version: 2.0
| |
− |
| |
− | --Remove the header 'gaf 2.0', from the top of the UniProt file
| |
− |
| |
− | --gzip file
| |
− |
| |
− | --Copy file to the tmp directory
| |
− |
| |
− | Use SVN commands to upload to the GO, also update README file every upload.
| |