New GO Progress Report Script
GO is now requiring quarterly progress reports, with the first one due at the meeting this month (2014-03-16).
We've been wanting to provide a more details progress report for GO for some time now, so this is a good opportunity to do that.
Here's one idea for C. elegans manual annotations:
- Ignore all lines with IEA evidence code
- Map UniProtKB identifiers in Column 2 to WBGene ID using gp2protein.wb
- Remove (i.e. ignore for further reporting) any resulting lines that are exact duplicate lines of annotation
- Total number of unique annotations (i.e., lines in the file)
- Total number of unique WBGenes
- For each of the values in qualifier Column 4 count number of annotations for a given evidence code in Column 7 and number of annotations with an entry in Column 12
- Sort results according to unique entries in Column 10 (i.e., each contributing group)
- Also report on any lines where the UniProtKB identifier cannot be converted to a WBGene