Difference between revisions of "Gene - GO Curation Status"
From WormBaseWiki
Jump to navigationJump to searchm (→Script Details) |
m (→Script Details) |
||
Line 35: | Line 35: | ||
****part_of | ****part_of | ||
****colocalizes_with | ****colocalizes_with | ||
− | *For now, we are only interested in protein-coding genes, so in the gpi file we only want to look at genes that have an entry beginning with UniProtKB: in column 8 | + | *For now, we are only interested in protein-coding genes, so in the gpi file we only want to look at genes that have an entry beginning with 'UniProtKB:' in column 8 |
− | |||
== Future Proofing == | == Future Proofing == |
Revision as of 18:17, 15 March 2019
Contents
Specifications for Gene - GO Curation Status Form
- Specifications for a weekly script to generate a webpage that lists which C. elegans protein-coding genes do not have one or more aspects (BP, MF, CC) of GO curation.
Input Files
- http://snapshot.geneontology.org/annotations/wb.gpad.gz
- ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.PRJNA13758.current_development.gene_product_info.gpi.gz
Output
- A web page that lists genes that do not have annotation for one or more aspects of GO.
- The page will be sectioned according to BP, MF, and CC.
- There would be links at the top of the page to each section (since the page could be long).
- Each section will have two columns:
- The value in gin_locus_name and the WBGene id
- mmcm-1 (WBGene00014202)
- the gene name and id would together link to the corresponding page in WB (https://wormbase.org/species/c_elegans/gene/WBGene00014202)
- The number of references in postgres that are Type:Journal_article and list that gene in pap_gene
- 7
- the number in the reference column would link to a page listing the matching papers that are, in turn, linked to their page in the paper editor
- this is the same as the results of performing a gene search in the paper editor: SELECT joinkey, pap_gene FROM pap_gene WHERE pap_gene = '00014200'
- The value in gin_locus_name and the WBGene id
Script Details
- The script would look at all of the C. elegans genes annotated for a given aspect in the snapshot gpad file and compare that to the list of protein-coding genes in the gpi file.
- Genes that don't have an annotation for either BP, MF, or CC but are in the gpi file would be listed in the section of the page for that aspect.
- We'll treat each aspect independently, i.e. a gene may have an no BP but an MF, but it would still be listed in the BP section.
- Since aspect is not captured in the gpad file, we'll need to use the relations listed in column 3 of the gpad
- BP relations:
- involved_in
- acts_upstream_of*
- acts_upstream_of_or_within*
- MF relations:
- enables
- contributes_to
- CC relations:
- part_of
- colocalizes_with
- BP relations:
- For now, we are only interested in protein-coding genes, so in the gpi file we only want to look at genes that have an entry beginning with 'UniProtKB:' in column 8
Future Proofing
- The gpad/gpi file format is likely going to change a bit sometime in the coming year, so we will need to make some modifications to the script.
- There may also be some changes to the CC relations, but that would be reflected in the overall changes to the file.