Difference between revisions of "Generating Initial GAF file for Upload to Postgres"
From WormBaseWiki
Jump to navigationJump to searchLine 10: | Line 10: | ||
*Step 7: Compare the values in each of the two tables and then output two files from the initial file of Phenotype2GO annotations: 1) a file containing those annotations that are redundant with the gp_association annotations (i.e., there is an exact match between the gene and the paper in each table), and 2) a file containing those annotations that are NOT present in the gp_association file (i.e., the gene-paper combination exists in the Phenotype2GO file but NOT in the gp_association file). | *Step 7: Compare the values in each of the two tables and then output two files from the initial file of Phenotype2GO annotations: 1) a file containing those annotations that are redundant with the gp_association annotations (i.e., there is an exact match between the gene and the paper in each table), and 2) a file containing those annotations that are NOT present in the gp_association file (i.e., the gene-paper combination exists in the Phenotype2GO file but NOT in the gp_association file). | ||
*Step 8: Upload the annotations in file #2 (non-redundant) to postgres gop_ OA tables. | *Step 8: Upload the annotations in file #2 (non-redundant) to postgres gop_ OA tables. | ||
+ | |||
+ | = Notes = | ||
+ | *The assumption here is that if a paper-gene connection exists from manual annotation, that it will have covered all of the possible annotations from that paper. This may not always be 100% true, but it will be simpler to filter this way than to try to also check the actual term used and determine parent-child (i.e. more or less granular) relations to decide which annotations to keep. Typically, the manual annotations are more granular than the Phenotype2GO annotations. | ||
Back to [[20141022_-_Phenotype2GO_Pipeline]] | Back to [[20141022_-_Phenotype2GO_Pipeline]] |
Revision as of 15:37, 5 November 2014
Initial Round of Entering Phenotype2GO-Based Annotations into Postgres
- The idea here is to generate a non-redundant set of Phenotype2GO annotations to enter into postgres. Non-redundant means that these annotations do no overlap with any existing manual annotations.
- Step 1: Retrieve the phenotype2go.wsxxx.wb file from the ftp site
- Step 2: Using the annotations retrieved from Step 1, create a two-column table that contains the WBGene ID found in Column 2 and the PMID value found in Column 6.
- Step 3: Using the gp_association file from UniProt-GOA, first map Column 2 value (a UniProtKB ID) to a WBGene ID using the WB gp2protein file and then replace the UniProtKB ID in Column 2 with the corresponding WBGene ID.
- Step 4: Output a list of any UniProtKB IDs that don't map to a WBGene ID and update the gp2protein file.
- Step 5: Repeat Step 3 with an updated gp2protein file, if needed.
- Step 6: Create a second two-column table that contains the WBGene ID now found in Column 2 of the gp_association file and the PMID value found in Column 5 of the gp_association file.
- Step 7: Compare the values in each of the two tables and then output two files from the initial file of Phenotype2GO annotations: 1) a file containing those annotations that are redundant with the gp_association annotations (i.e., there is an exact match between the gene and the paper in each table), and 2) a file containing those annotations that are NOT present in the gp_association file (i.e., the gene-paper combination exists in the Phenotype2GO file but NOT in the gp_association file).
- Step 8: Upload the annotations in file #2 (non-redundant) to postgres gop_ OA tables.
Notes
- The assumption here is that if a paper-gene connection exists from manual annotation, that it will have covered all of the possible annotations from that paper. This may not always be 100% true, but it will be simpler to filter this way than to try to also check the actual term used and determine parent-child (i.e. more or less granular) relations to decide which annotations to keep. Typically, the manual annotations are more granular than the Phenotype2GO annotations.
Back to 20141022_-_Phenotype2GO_Pipeline