UniProt Paper - Gene - Data Type
Original file generated for UniProt:
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi
Updates to file will now include data types curated for a given gene.
What we currently supply:
WBGene WBPaperID PMID
Not sure how this is generated - before my time.
What we need to add:
WBGene WBPaperID PMID Category
Easiest to get everything from WS or a mixture of WS and postgres?
Some things, like GO, RNAi and Variation Phenotypes, need to be from WS
The Categories would be gene-specific and we will supply information for:
GO:PPI;Phenotype;Disease;Expression;Sequence
Strategy: Several possible strategies - not sure which is best.
1) Start with Paper object and then trace the data types in the Refers_to tag - this works for everything but Disease
2) Look at each object in each relevant class - this seems computationally very intensive
How to map this onto our data types from each WS release:
GO:
?GO_annotation -> Gene -> Reference
PPI:
?Interaction -> Interaction_type Physical -> Interactor_overlapping_gene -> Paper
Phenotype:
?RNAi -> Inhibits -> Phenotype -> Reference
?Variation -> Affects -> Phenotype -> Reference
Expression:
?Expr_pattern -> Expression_of -> Gene -> Reference
Sequence:
?Variation -> Affects
-> Nonsense -> Missense -> Silent Any one of these filled in -> Splice_site -> Frameshift -> Readthrough
-> Reference
Disease:
?Gene -> Disease_info -> Experimental -> Evidence -> Paper_evidence -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence