Difference between revisions of "New GO Progress Report Script"

From WormBaseWiki
Jump to navigationJump to search
Line 14: Line 14:
  
 
Then determine:
 
Then determine:
#Total number of unique annotations (i.e., lines in the file)
+
#Total number of unique annotations
 
#Total number of unique WBGenes  
 
#Total number of unique WBGenes  
 
#For each of the values in qualifier Column 4 count number of annotations for a given evidence code in Column 7 and number of annotations with an entry in Column 12
 
#For each of the values in qualifier Column 4 count number of annotations for a given evidence code in Column 7 and number of annotations with an entry in Column 12
 
#Sort results according to unique entries in Column 10 (i.e., each contributing group)
 
#Sort results according to unique entries in Column 10 (i.e., each contributing group)
 
#Also report on any lines where the UniProtKB identifier cannot be converted to a WBGene
 
#Also report on any lines where the UniProtKB identifier cannot be converted to a WBGene

Revision as of 18:39, 6 March 2014

GO is now requiring quarterly progress reports, with the first one due at the meeting this month (2014-03-16).

We've been wanting to provide a more details progress report for GO for some time now, so this is a good opportunity to do that.

Here's one idea for C. elegans manual annotations:

Input files:

  • gp2protein.wb
  • gp_association.wb
  1. Ignore all lines with IEA evidence code
  2. Replace UniProtKB identifiers in Column 2 with WBGene ID using gp2protein.wb
  3. Remove (i.e. ignore for further reporting) any resulting lines that are exact duplicate lines of annotation

Then determine:

  1. Total number of unique annotations
  2. Total number of unique WBGenes
  3. For each of the values in qualifier Column 4 count number of annotations for a given evidence code in Column 7 and number of annotations with an entry in Column 12
  4. Sort results according to unique entries in Column 10 (i.e., each contributing group)
  5. Also report on any lines where the UniProtKB identifier cannot be converted to a WBGene