Difference between revisions of "Phenotype2GO pipeline SOP"

From WormBaseWiki
Jump to navigationJump to search
(New page: '''Phenotype2GO pipeline (Combination of Sanger and Caltech):''' '''From Igor's work in January 2009:''' Modifications to the script (that Sanger uses) that generates gene_association fi...)
 
Line 1: Line 1:
 
'''Phenotype2GO pipeline (Combination of Sanger and Caltech):'''
 
'''Phenotype2GO pipeline (Combination of Sanger and Caltech):'''
  
'''From Igor's work in January 2009:'''
+
'''Modifications to the Sanger script that generates the gene_association file (from Igor's work in January 2009):'''
  
Modifications to the script (that Sanger uses) that generates gene_association file:
+
*Instead of excluding some papers while generating GO terms from RNAi experiments (the so-called 'exclude list'), Ranjana and Kimberly will instead supply an 'include list' that comprises papers (mostly large scale genome-wide studies) that they inspected and explicitly agreed on propagating RNAi phenotypes to GO terms.
  
Instead of excluding some papers while generating GO terms from RNAi experiments (the so-called 'exclude list'), Ranjana and Kimberly will instead supply an 'include list' that comprises papers (mostly large scale genome-wide studies) that they inspected and explicitly agreed on propagating RNAi phenotypes to GO terms.  
+
*A new script was submitted to Sanger to achieve this. To use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
  
To use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
+
*If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.  
  
If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.  
+
*This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files
 +
**Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb.  
 +
**So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used.
 +
**We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed.
 +
**To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.  
  
This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files
+
'''Sanger's Understanding:
Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb.
+
*We should not use the inherit_go_terms -phenotype  method to apply GO_terms, but rather use the new version of parse_go_terms_new.pl with the -acefile and- include list options.
So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used.
 
We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed.
 
To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.
 
 
 
'''Sanger's Understanding: Anthony Rogers wrote:'''
 
 
 
We should not use the inherit_go_terms -phenotype  method to apply GO_terms, but rather use the new version of parse_go_terms_new.pl with the -acefile and- include list options.
 
This will then give curators control of all IMP GO_terms so the acedb database will be the same as the gene_association file that is produced at Caltech for export to GOC.
 
  
 
'''Current status:'''
 
'''Current status:'''
 
+
From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
From Igor's e-mail, March 2009: I don't think they actually have been disabled (this is the phenotype option of the inherit_go_terms script). The script should be run without the '-variation' option, but the gene_association file still has those. Try this:
 
 
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi
 
grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi
  
 
'''Plans:'''
 
'''Plans:'''
 
+
*The 'include' file consisting of a list of papers will be dynamic, changes can be made if required every build.
The 'include' file will be dynamic, changes can be made if required every build.
+
*When manual annotation provides a granular child of a parent term that has been attached to the gene via phenotype2GO mapping, and the curators feel the automatic term should be removed, the script needs to stop assigning the parent term to the same gene at every build. How can this be achieved?  
When manual anntotation provides a granular child of a parent term that has been attached to the gene via phenotype2GO mapping, the script needs to stop assigning the parent term to the same gene at every build. How can this be achieved? Provide an exclude list of gene-GO_term that Sanger script will consult every build?
+
*Provide an exclude list of gene-GO_term assignments that Sanger script will consult every build?

Revision as of 14:16, 22 June 2009

Phenotype2GO pipeline (Combination of Sanger and Caltech):

Modifications to the Sanger script that generates the gene_association file (from Igor's work in January 2009):

  • Instead of excluding some papers while generating GO terms from RNAi experiments (the so-called 'exclude list'), Ranjana and Kimberly will instead supply an 'include list' that comprises papers (mostly large scale genome-wide studies) that they inspected and explicitly agreed on propagating RNAi phenotypes to GO terms.
  • A new script was submitted to Sanger to achieve this. To use it invoke the script with the -includelist option, e.g.: Run parse_go_terms_new.pl -o gene_association.wb -rnai -include includelist.txt (this example only parses RNAi experiments, to generate full file, you should also give '-gene -var' options as before).
  • If you invoke it with '-acefile <filename>' option, the script will also generate Gene-GO_term connections derived from phenotypes. This is currently done by the phenotype procedure of the inherit_GO_terms.pl script.
  • This script: inherit_GO_terms.pl does not consult any exclusion/inclusion files
    • Follows a different logic from parse_go_terms_new.pl, this results in disparity between the gene_association file and the data that ends up in acedb.
    • So the phenotype procedure of inherit_GO_terms.pl needs to be disabled and the ace file generated by the new script needs to be used.
    • We do not want every gene with a phenotype to get a GO_term from the phenotype2GO mapping file, but just the genes from 'genome-wide' papers that have been reviewed.
    • To alter Sanger's version of parse_go_terms_new.pl, a patch file was provided.

Sanger's Understanding:

  • We should not use the inherit_go_terms -phenotype method to apply GO_terms, but rather use the new version of parse_go_terms_new.pl with the -acefile and- include list options.

Current status: From Igor's e-mail, March 2009: I don't think the phenotype option of the inherit_go_terms script has been disabled. The script should be run without the '-variation' option, but the gene_association file still has those. Try this: grep -i wbpheno gene_association.WS200.wb.ce |grep -v RNAi

Plans:

  • The 'include' file consisting of a list of papers will be dynamic, changes can be made if required every build.
  • When manual annotation provides a granular child of a parent term that has been attached to the gene via phenotype2GO mapping, and the curators feel the automatic term should be removed, the script needs to stop assigning the parent term to the same gene at every build. How can this be achieved?
  • Provide an exclude list of gene-GO_term assignments that Sanger script will consult every build?