Phenotype2GO pipeline SOP

From WormBaseWiki
Revision as of 01:58, 19 June 2009 by Rkishore (talk | contribs) (New page: '''Phenotype2GO pipeline (Combination of Sanger and Caltech):''' '''From Igor's work in January 2009:''' Modifications to the script (that Sanger uses) that generates gene_association fi...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Phenotype2GO pipeline (Combination of Sanger and Caltech):

From Igor's work in January 2009:

Modifications to the script (that Sanger uses) that generates gene_association file:

Instead of excluding some papers while generating GO terms from RNAi experiments (the so-called 'exclude list'), Ranjana and Kimberly will instead supply an 'include list' that comprises papers (mostly large scale genome-wide studies) that they inspected and explicitly agreed on propagating RNAi phenotypes to GO terms.

To use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).

If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.

This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb. So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used. We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed. To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.

Sanger's Understanding: Anthony Rogers wrote:

We should not use the inherit_go_terms -phenotype method to apply GO_terms, but rather use the new version of parse_go_terms_new.pl with the -acefile and- include list options. This will then give curators control of all IMP GO_terms so the acedb database will be the same as the gene_association file that is produced at Caltech for export to GOC.

Current status:

From Igor's e-mail, March 2009: I don't think they actually have been disabled (this is the phenotype option of the inherit_go_terms script). The script should be run without the '-variation' option, but the gene_association file still has those. Try this: grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi

Plans:

The 'include' file will be dynamic, changes can be made if required every build. When manual anntotation provides a granular child of a parent term that has been attached to the gene via phenotype2GO mapping, the script needs to stop assigning the parent term to the same gene at every build. How can this be achieved? Provide an exclude list of gene-GO_term that Sanger script will consult every build?