Difference between revisions of "Phenotype2GO pipeline SOP"

From WormBaseWiki
Jump to navigationJump to search
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Phenotype2GO pipeline (Combination of Sanger and Caltech):'''
+
=====Phenotype2GO pipeline (Sanger and Caltech)=====
  
'''Modifications to the Sanger script that generates the gene_association file (from Igor's work in January 2009):'''
+
*The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
  
*Instead of excluding some papers while generating GO terms from RNAi experiments (the so-called 'exclude list'), Ranjana and Kimberly will instead supply an 'include list' that comprises papers (mostly large scale genome-wide studies) that they inspected and explicitly agreed on propagating GO terms to genes based on their RNAi phenotypes.
+
*A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
 
 
*A new script was submitted to Sanger to achieve this. To use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
 
  
 
*If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.  
 
*If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.  
  
*This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files
+
*The old script: inherit_GO_terms.pl does not consult any exclusion/inclusion files.
**Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb.
 
**So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used.
 
**We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed.
 
**To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.  
 
  
'''Sanger's Understanding:
+
*To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.  
*We should not use the inherit_go_terms -phenotype  method to apply GO_terms, but rather use the new version of parse_go_terms_new.pl with the -acefile and- include list options.
 
*Question:  What else depends on the inherit_go_terms script? The IEAs?
 
  
'''Current status:'''
+
*Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled.  The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled.  The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
 
 
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi
 
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi
 +
This is now resolved.
 +
 +
==Phenotype2GO mappings==
 +
 +
  
'''Plans:'''
+
[[Category:Curation]]
*The 'include' file consisting of a list of papers will be dynamic, changes can be made if required every build.
 
*When manual annotation provides a granular child of a parent term that has been attached to the gene via phenotype2GO mapping, and the curators feel the automatic term should be removed, the script needs to stop assigning the parent term to the same gene at every build. How can this be achieved?
 
*Provide an exclude list of gene-GO_term assignments that Sanger script will consult every build?
 

Latest revision as of 04:28, 8 September 2010

Phenotype2GO pipeline (Sanger and Caltech)
  • The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
  • A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
  • If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
  • The old script: inherit_GO_terms.pl does not consult any exclusion/inclusion files.
  • To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
  • Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:

grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi This is now resolved.

Phenotype2GO mappings