Difference between revisions of "Phenotype2GO pipeline SOP"

From WormBaseWiki
Jump to navigationJump to search
(New page: '''Phenotype2GO pipeline (Combination of Sanger and Caltech):''' '''From Igor's work in January 2009:''' Modifications to the script (that Sanger uses) that generates gene_association fi...)
 
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Phenotype2GO pipeline (Combination of Sanger and Caltech):'''
+
=====Phenotype2GO pipeline (Sanger and Caltech)=====
  
'''From Igor's work in January 2009:'''
+
*The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
  
Modifications to the script (that Sanger uses) that generates gene_association file:
+
*A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
  
Instead of excluding some papers while generating GO terms from RNAi experiments (the so-called 'exclude list'), Ranjana and Kimberly will instead supply an 'include list' that comprises papers (mostly large scale genome-wide studies) that they inspected and explicitly agreed on propagating RNAi phenotypes to GO terms.  
+
*If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.  
  
To use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
+
*The old script: inherit_GO_terms.pl does not consult any exclusion/inclusion files.
  
If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.  
+
*To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.  
  
This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files
+
*Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb.
+
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi
So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used.
+
This is now resolved.
We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed.
 
To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.  
 
 
 
'''Sanger's Understanding: Anthony Rogers wrote:'''
 
 
 
We should not use the inherit_go_terms -phenotype  method to apply GO_terms, but rather use the new version of parse_go_terms_new.pl with the -acefile and- include list options.
 
This will then give curators control of all IMP GO_terms so the acedb database will be the same as the gene_association file that is produced at Caltech for export to GOC.
 
  
'''Current status:'''
+
==Phenotype2GO mappings==
  
From Igor's e-mail, March 2009: I don't think they actually have been disabled (this is the phenotype option of the inherit_go_terms script). The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
 
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi
 
  
'''Plans:'''
 
  
The 'include' file will be dynamic, changes can be made if required every build.
+
[[Category:Curation]]
When manual anntotation provides a granular child of a parent term that has been attached to the gene via phenotype2GO mapping, the script needs to stop assigning the parent term to the same gene at every build. How can this be achieved? Provide an exclude list of gene-GO_term that Sanger script will consult every build?
 

Latest revision as of 04:28, 8 September 2010

Phenotype2GO pipeline (Sanger and Caltech)
  • The old Sanger script that generates the gene_association file (from Igor's work in January 2009) was changed. Instead of an exclusion list and 'include list' that comprises papers (mostly large scale genome-wide studies) is provided to the script. This list is curator approved and explicitly agreed upon for the propagation of GO terms to genes based on their RNAi phenotypes.
  • A new script is used, to use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
  • If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
  • The old script: inherit_GO_terms.pl does not consult any exclusion/inclusion files.
  • To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
  • Current status:From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:

grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi This is now resolved.

Phenotype2GO mappings