Difference between revisions of "Adding new Phenotype2GO annotations to postgres"

From WormBaseWiki
Jump to navigationJump to search
Line 7: Line 7:
 
*Next Steps, 2015-04-09:
 
*Next Steps, 2015-04-09:
 
**New script:Except for the date in Column 14, will compare all columns in two phenotype2go.WSnnn.wb GAF files and output a new newGpaEntries file of all unique lines
 
**New script:Except for the date in Column 14, will compare all columns in two phenotype2go.WSnnn.wb GAF files and output a new newGpaEntries file of all unique lines
 +
**/home/postgres/work/pgpopulation/go/go_curation/20150410_phenotype2go_build_comparison/phenotype2go_build_comparison.pl
 
**The two GAF files for generating the WS249 upload are on mangolassi here:  /home/acedb/kimberly/citace_upload/go/phenotype2go/new_annotation_entries/WS249_upload
 
**The two GAF files for generating the WS249 upload are on mangolassi here:  /home/acedb/kimberly/citace_upload/go/phenotype2go/new_annotation_entries/WS249_upload
 
**If needed, curator will manually edit the newGpaEntries file (i.e. remove deleted entries)
 
**If needed, curator will manually edit the newGpaEntries file (i.e. remove deleted entries)

Revision as of 13:20, 2 July 2015

  • All Phenotype2GO annotations are generated anew with each WS build, however, we only need to add the annotations from the most recent build to the OA tables in postgres.
  • The original set of annotations added to postgres came from WS247.
    • This set of annotations was first compared to the GAF (note, not the GPAD) from UniProt to avoid entering redundant annotations from manual curation.
    • Once redundant annotations were removed, the unique file of entries, newGpaEntries, was parsed and data added to the GO OA tables using the script here: /home/postgres/work/pgpopulation/go/go_curation/20141106_kevin_godata/populate_gop_OA_pheno2go.pl
  • Subsequent annotations will be generated by comparing the live and staging versions of the phenotype2go.wb files found on the ftp site (e.g. ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS247/ONTOLOGY/phenotype2go.WS247.wb and ftp://ftp.sanger.ac.uk/pub/wormbase/releases/WS248/ONTOLOGY/phenotype2go.WS248.wb) and then parsing unique entries into postgres
    • Caveat: unique entries could be the result of new annotations added or old annotations that were removed, therefore the curator will need to check the newGpaEntries file at each build to make sure they know what is being added and if needed, to delete any annotations from postgres that were removed during the WS build
  • Next Steps, 2015-04-09:
    • New script:Except for the date in Column 14, will compare all columns in two phenotype2go.WSnnn.wb GAF files and output a new newGpaEntries file of all unique lines
    • /home/postgres/work/pgpopulation/go/go_curation/20150410_phenotype2go_build_comparison/phenotype2go_build_comparison.pl
    • The two GAF files for generating the WS249 upload are on mangolassi here: /home/acedb/kimberly/citace_upload/go/phenotype2go/new_annotation_entries/WS249_upload
    • If needed, curator will manually edit the newGpaEntries file (i.e. remove deleted entries)
    • Final newGpaEntries file will be used for postgres upload using the populate_gop_OA_pheno2go.pl script
  • 2015-04-20:
    • The new script from 2015-04-09 identified over 1000 lines that were different between WS247 and WS248, which was about an order of magnitude higher than expected.
    • One main difference seems to be that there are RNAi experiments that map to one gene in WS248 (and WS246 even) that map to two genes in WS247. Confirmed with Hinxton that WS246 and WS248 have the correct data.
    • So....WS247 seems to have an erroneous number of RNAi mappings, and WS248 would be better as the baseline, starting point for Phenotype2GO annotations in postgres.
    • Plan: remove WS247 Phenotype2GO annotations from postgres and re-populate with Phenotype2GO annotations from WS248.
    • Will need to re-run scripts with new data
    • Create new directory: /home/postgres/work/pgpopulation/go/go_curation/20150416_initial_phenotype2go
    • Files needed in this directory: gene_association.goa_worm, gp2protein.wb, phenotype2go.WS248.wb
      • Files are currently on mangolassi here: /home/acedb/kimberly/citace_upload/go/phenotype2go/new_annotation_entries/WS249_upload/WS248_import
    • Scripts needed in this directory: parse_kevin_godata.pl and populate_gop_OA_pheno2go.pl
      • Script are currently on mangolassi here: /home/postgres/work/pgpopulation/go/go_curation/20141106_kevin_godata/
    • parse_kevin_godata.pl will create three new files, one of which, newGpaEntries, will be used to re-populate the OA