UniProt Paper - Gene - Data Type
Original file generated for UniProt:
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi
What we currently supply:
WBGene WBPaperID PMID
Not sure how this is generated; I believe this was done before my time as paper curator.
Proposed updates to file will now include data types curated for a given gene.
We would need to add:
WBGene WBPaperID PMID Category
The Categories would be gene-specific and we will supply information for:
GO
PPI (Protein-Protein Interaction)
Phenotype
Disease
Expression
Sequence
An example:
WBGene00003508 WBPaper00003680 pmid10517638 GO;Phenotype;Disease;Expression
Strategy: Several possible strategies, perhaps - not sure which is best.
Easiest to get everything from WS or a mixture of WS and postgres?
Some things, like GO, RNAi and Variation Phenotypes, need to be from WS
Possibilities:
1) Start with Paper object and then trace the information in the objects xref'ed in the Refers_to tag - this works for everything but Disease
2) Look at each object in each relevant class - this seems computationally very intensive
Relevant tags in the different object models:
GO:
?GO_annotation -> Gene -> Reference
PPI:
?Interaction -> Interaction_type Physical -> Interactor_overlapping_gene -> Paper
Phenotype:
?RNAi -> Inhibits -> Phenotype -> Reference
?Variation -> Affects -> Phenotype -> Reference
Expression:
?Expr_pattern -> Expression_of -> Gene -> Reference
Sequence:
?Variation -> Affects
-> Nonsense -> Missense -> Silent Any one of these filled in -> Splice_site -> Frameshift -> Readthrough
-> Reference
Disease:
?Gene -> Disease_info -> Experimental -> Evidence -> Paper_evidence -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence