Difference between revisions of "UniProtKB gpad to WormBase .ace"

From WormBaseWiki
Jump to navigationJump to search
Line 14: Line 14:
 
**gpad_extra_column - a file that adds the WBGene ID as an extra column (a new column 2) to the gpad file
 
**gpad_extra_column - a file that adds the WBGene ID as an extra column (a new column 2) to the gpad file
 
**gpad_extra_column.err - a file that indicates:
 
**gpad_extra_column.err - a file that indicates:
***which UniProtKB IDs don't map to WBGene IDs in the gp2protein.wb file
+
***which UniProtKB IDs don't map to WBGene IDs in the gp2protein.wb file, for example:
 +
****IDs that correspond to a transposon or retrotransposon reverse transcriptase
 +
****IDs that correspond to non-elegans species, used for ISS annotations or the occasional annotation extension
 
***which PMIDs don't map to a WBPaper ID
 
***which PMIDs don't map to a WBPaper ID
 
***which annotation extensions can't be mapped to the model
 
***which annotation extensions can't be mapped to the model
 
**gp_annotation.ace - the .ace file for upload to citace
 
**gp_annotation.ace - the .ace file for upload to citace
  
*The gpad file format is document on the GOC wiki here:
+
*The gpad file format is documented on the GOC wiki here:
 
**http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format
 
**http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format
 
**Note that once the parsing script runs and we add an additional column, the column numbers shift one higher in the .err output file
 
**Note that once the parsing script runs and we add an additional column, the column numbers shift one higher in the .err output file

Revision as of 14:19, 22 July 2015

  • The gpad file that contains all of the C. elegans annotations currently in Protein2GO is produced by UniProtKB on a weekly basis.
  • Download the file from the UniProtKB ftp link and put it on tazendra here (in the appropriate year and month directory):
    • /home/acedb/kimberly/citace_upload/go/gpad2ace
    • for example: /home/acedb/kimberly/citace_upload/go/gpad2ace/2015_February
  • To convert the gpad file to a .ace file you'll need:
    • gp2protein.wb file that maps UniProtKB IDs to WBGenes
    • go_gpad_parser.pl
  • The go_gpad_parse.pl generates three files:
    • gpad_extra_column - a file that adds the WBGene ID as an extra column (a new column 2) to the gpad file
    • gpad_extra_column.err - a file that indicates:
      • which UniProtKB IDs don't map to WBGene IDs in the gp2protein.wb file, for example:
        • IDs that correspond to a transposon or retrotransposon reverse transcriptase
        • IDs that correspond to non-elegans species, used for ISS annotations or the occasional annotation extension
      • which PMIDs don't map to a WBPaper ID
      • which annotation extensions can't be mapped to the model
    • gp_annotation.ace - the .ace file for upload to citace