Difference between revisions of "Gene - GO Curation Status"

From WormBaseWiki
Jump to navigationJump to search
m
Line 24: Line 24:
 
*Genes that don't have an annotation for either BP, MF, or CC but are in the gpi file would be listed in the section of the page for that aspect.
 
*Genes that don't have an annotation for either BP, MF, or CC but are in the gpi file would be listed in the section of the page for that aspect.
 
**We'll treat each aspect independently, i.e. a gene may have an no BP but an MF, but it would still be listed in the BP section.
 
**We'll treat each aspect independently, i.e. a gene may have an no BP but an MF, but it would still be listed in the BP section.
*For now, we are only interested in protein-coding genes, so in the gpi file we only want to look at genes that have an entry  
+
**Since aspect is not captured in the gpad file, we'll need to use the relations listed in column 3 of the gpad
 
+
***BP relations:
 
+
****involved_in
 +
****acts_upstream_of*
 +
****acts_upstream_of_or_within*
 +
***MF relations:
 +
****enables
 +
****contributes_to
 +
***CC relations:
 +
****part_of
 +
****colocalizes_with
 +
*For now, we are only interested in protein-coding genes, so in the gpi file we only want to look at genes that have an entry beginning with UniProtKB: in column 8
  
  
 +
== Future Proofing ==
 +
*The gpad/gpi file format is likely going to change a bit sometime in the coming year, so we will need to make some modifications to the script.
 +
*There may also be some changes to the CC relations, but that would be reflected in the overall changes to the file.
  
  
 
[[Category:Curation]]
 
[[Category:Curation]]

Revision as of 18:16, 15 March 2019

Specifications for Gene - GO Curation Status Form

  • Specifications for a weekly script to generate a webpage that lists which C. elegans protein-coding genes do not have one or more aspects (BP, MF, CC) of GO curation.

Input Files

Output

  • A web page that lists genes that do not have annotation for one or more aspects of GO.
  • The page will be sectioned according to BP, MF, and CC.
  • There would be links at the top of the page to each section (since the page could be long).
  • Each section will have two columns:
    • The value in gin_locus_name and the WBGene id
    • The number of references in postgres that are Type:Journal_article and list that gene in pap_gene
      • 7
      • the number in the reference column would link to a page listing the matching papers that are, in turn, linked to their page in the paper editor
      • this is the same as the results of performing a gene search in the paper editor: SELECT joinkey, pap_gene FROM pap_gene WHERE pap_gene = '00014200'

Script Details

  • The script would look at all of the C. elegans genes annotated for a given aspect in the snapshot gpad file and compare that to the list of protein-coding genes in the gpi file.
  • Genes that don't have an annotation for either BP, MF, or CC but are in the gpi file would be listed in the section of the page for that aspect.
    • We'll treat each aspect independently, i.e. a gene may have an no BP but an MF, but it would still be listed in the BP section.
    • Since aspect is not captured in the gpad file, we'll need to use the relations listed in column 3 of the gpad
      • BP relations:
        • involved_in
        • acts_upstream_of*
        • acts_upstream_of_or_within*
      • MF relations:
        • enables
        • contributes_to
      • CC relations:
        • part_of
        • colocalizes_with
  • For now, we are only interested in protein-coding genes, so in the gpi file we only want to look at genes that have an entry beginning with UniProtKB: in column 8


Future Proofing

  • The gpad/gpi file format is likely going to change a bit sometime in the coming year, so we will need to make some modifications to the script.
  • There may also be some changes to the CC relations, but that would be reflected in the overall changes to the file.