Phenotype2GO pipeline SOP
Phenotype2GO pipeline (Combination of Sanger and Caltech):
Modifications to the Sanger script that generates the gene_association file (from Igor's work in January 2009):
- Instead of excluding some papers while generating GO terms from RNAi experiments (the so-called 'exclude list'), Ranjana and Kimberly will instead supply an 'include list' that comprises papers (mostly large scale genome-wide studies) that they inspected and explicitly agreed on propagating GO terms to genes based on their RNAi phenotypes.
- A new script was submitted to Sanger to achieve this. To use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
- If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
- This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files
- Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb.
- So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used.
- We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed.
- To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
- We should not use the inherit_go_terms -phenotype method to apply GO_terms, but rather use the new version of parse_go_terms_new.pl with the -acefile and- include list options.
- Question: What else depends on the inherit_go_terms script? The IEAs?
Current status: From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this: grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi
- The 'include' file consisting of a list of papers will be dynamic, changes can be made if required every build.
- When manual annotation provides a granular child of a parent term that has been attached to the gene via phenotype2GO mapping, and the curators feel the automatic term should be removed, the script needs to stop assigning the parent term to the same gene at every build. How can this be achieved?
- Provide an exclude list of gene-GO_term assignments that Sanger script will consult every build?