Phenotype2GO pipeline SOP
Phenotype2GO pipeline (Combination of Sanger and Caltech):
The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed:
- Instead of excluding some papers while attaching GO terms to genes based on phenotypes from RNAi experiments (the so-called 'exclude list'), an 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
- A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
- If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
- This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files
- Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb.
- So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used.
- We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed.
- To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
Current status: From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this: grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi