GAF to .ace file

From WormBaseWiki
Jump to navigationJump to search

The new pipeline for uploading manual GO annotations will be different in that we will now get our manual annotations to protein-coding genes from UniProtKB.

Here's one possible scenario on how that could work:

Manual Annotations

  1. Download a GAF from UniProtKB for all manual annotations to protein-coding genes
  2. Add annotations to ncRNAs and uncloned genes to the GAF by dumping a GAF from tazendra, diffing the file, and then adding to the UniProt GAF, any annotations present on tazendra but not in the UniProt GAF (will need to check that this really only gives ncRNA and uncloned gene annotations)
  3. With complete GAF, convert all UniProtKB identifiers to WBGene identifiers using most current gp2protein file

Phenotype2GO Annotations

  1. Currently, the Phenotype2GO-based annotations are incorporated into WB as part of the database build.
  2. The resulting annotations are converted to GAF format and then put on an ftp site corresponding to each build.
  3. The Phenotype2GO GAF is combined with the manual and InterPro2GO GAF (see below) and the combined file is uploaded to the GO repository.

Proposed Change to Phenotype2GO Pipeline

  1. The current Phenotype2GO pipeline maps WBPhenotype terms to GO terms and then associates genes to GO terms if they have been annotated to that WBPhenotype term.
  2. This pipeline has allowed us to provide annotations for a number of genes that have not been intensively studied, but for some well studied genes also results in annotations to GO terms that represent processes far downstream from where these genes are known to function.
  3. To improve the quality of the Phenotype2GO annotations we will need to facilitate some level of manual review that would allow curators to decide, upon reviewing the state of knowledge on that gene's function, which Phenotype2GO annotations should be included.
  4. To do this, the current bolus of Phenotype2GO-based annotations would be uploaded to the new Protein2GO curation tool.
  5. Any new Phenotype2GO annotations generated by WB phenotype annotations would be included in a separate GAF constructed with each build, but NOT included in the WB build.
  6. New Phenotype2GO annotations would be uploaded to Protein2GO to be available for review.
  7. All manual and Phenotype2GO annotations for WB would be downloaded from Protein2GO as a GAF which would be converted to a .ace file for upload to WB.
  8. Thus, the ultimate source of Phenotype2GO annotations for WB and for GO would be those annotations downloaded from Protein2GO. This would allow manual review of the Phenotype2GO annotation and a mechanism for improving the quality of annotations made from this pipeline.


IEA Annotations

  1. Currently, the InterPro2GO (IEA) annotations are incorporated into WB as part of the database build.
  2. The resulting annotations are converted to GAF format and then put on an ftp site corresponding to each build.
  3. The InterPro2GO GAF is combined with the manual and Phenotype2GO GAF (see above) and the combined file is uploaded to the GO repository.