Difference between revisions of "UniProtKB gpad to WormBase .ace"

Revision as of 15:00, 22 July 2015

The gpad file that contains all of the C. elegans annotations currently in Protein2GO is produced by UniProtKB on a weekly basis.
- A new file is available every Monday.
- The file is located here: ftp://ftp.ebi.ac.uk/pub/contrib/goa/ and is named: gp_association.6239_wormbase.gz

Download the file from the UniProtKB ftp link and put it on tazendra here (in the appropriate year and month directory):
- /home/acedb/kimberly/citace_upload/go/gpad2ace
- for example: /home/acedb/kimberly/citace_upload/go/gpad2ace/2015_February

To convert the gpad file to a .ace file you'll need:
- gp2protein.wb file that maps UniProtKB IDs to WBGenes
- go_gpad_parser.pl

The gpad file format is documented on the GOC wiki here:
- http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format
- Note that once the parsing script runs and we add an additional column, the column numbers shift one higher in the .err output file

@@ Line 19: / Line 19: @@
 ****IDs that correspond to protein fragments based on translation of incomplete nucleotide submissions to EMBL/GenBank/DDBJ databases (column 3)
 *****O17540 - doesn't match 100% to the C. elegans proteome; closest match is kin-20
-*****O17541 - almost perfect match to kin-24
+*****O17541 - near perfect match to kin-24
 *****O17542 - closest match is kin-26
 *****O17543 - closest match is kin-26
+*****O17544 - near perfect match to sid-3
 ****IDs that correspond to non-elegans species, used for ISS annotations or the occasional annotation extension (column 8 or 12)
 *****Q8WXF0