Difference between revisions of "UniProt Paper - Gene - Data Type"

From WormBaseWiki
Jump to navigationJump to search
Line 2: Line 2:
  
 
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi
 
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi
 
Updates to file will now include data types curated for a given gene.
 
 
  
 
What we currently supply:
 
What we currently supply:
Line 10: Line 7:
 
WBGene WBPaperID PMID
 
WBGene WBPaperID PMID
  
Not sure how this is generated - before my time.
+
Not sure how this is generated; I believe this was done before my time as paper curator.
  
  
What we need to add:
+
Proposed updates to file will now include data types curated for a given gene.
 +
 
 +
We would need to add:
  
 
WBGene WBPaperID PMID Category
 
WBGene WBPaperID PMID Category
Line 20: Line 19:
 
The Categories would be gene-specific and we will supply information for:
 
The Categories would be gene-specific and we will supply information for:
  
GO:PPI;Phenotype;Disease;Expression;Sequence
+
GO
 +
 
 +
PPI (Protein-Protein Interaction)
 +
 
 +
Phenotype
 +
 
 +
Disease
 +
 
 +
Expression
 +
 
 +
Sequence
 +
 
 
   
 
   
 
An example:
 
An example:
Line 29: Line 39:
  
  
Strategy: Several possible strategies - not sure which is best.
+
Strategy: Several possible strategies, perhaps - not sure which is best.
  
 
Easiest to get everything from WS or a mixture of WS and postgres?
 
Easiest to get everything from WS or a mixture of WS and postgres?
Line 35: Line 45:
 
Some things, like GO, RNAi and Variation Phenotypes, need to be from WS
 
Some things, like GO, RNAi and Variation Phenotypes, need to be from WS
  
1) Start with Paper object and then trace the data types in the Refers_to tag - this works for everything but Disease
+
Possibilities:
 +
 
 +
1) Start with Paper object and then trace the information in the objects xref'ed in the Refers_to tag - this works for everything but Disease
  
 
2) Look at each object in each relevant class - this seems computationally very intensive
 
2) Look at each object in each relevant class - this seems computationally very intensive
  
  
How to map this onto our data types from each WS release:
+
Relevant tags in the different object models:
  
  
Line 60: Line 72:
  
 
     ?Variation -> Affects
 
     ?Variation -> Affects
                -> Phenotype
+
                -> Phenotype
                -> Reference
+
                -> Reference
  
  

Revision as of 18:21, 20 May 2015

Original file generated for UniProt:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi

What we currently supply:

WBGene WBPaperID PMID

Not sure how this is generated; I believe this was done before my time as paper curator.


Proposed updates to file will now include data types curated for a given gene.

We would need to add:

WBGene WBPaperID PMID Category


The Categories would be gene-specific and we will supply information for:

GO

PPI (Protein-Protein Interaction)

Phenotype

Disease

Expression

Sequence


An example:

    WBGene00003508  WBPaper00003680  pmid10517638  GO;Phenotype;Disease;Expression 



Strategy: Several possible strategies, perhaps - not sure which is best.

Easiest to get everything from WS or a mixture of WS and postgres?

Some things, like GO, RNAi and Variation Phenotypes, need to be from WS

Possibilities:

1) Start with Paper object and then trace the information in the objects xref'ed in the Refers_to tag - this works for everything but Disease

2) Look at each object in each relevant class - this seems computationally very intensive


Relevant tags in the different object models:


GO:

    ?GO_annotation -> Gene
                   -> Reference


PPI:

    ?Interaction -> Interaction_type Physical
                 -> Interactor_overlapping_gene
                 -> Paper


Phenotype:

    ?RNAi -> Inhibits
          -> Phenotype
          -> Reference
    ?Variation -> Affects
               -> Phenotype
               -> Reference


Expression:

    ?Expr_pattern -> Expression_of -> Gene
                  -> Reference


Sequence:

    ?Variation -> Affects
               -> Nonsense
               -> Missense
               -> Silent			Any one of these filled in
               -> Splice_site
               -> Frameshift
               -> Readthrough
               -> Reference

Disease:

    ?Gene -> Disease_info -> Experimental -> Evidence -> Paper_evidence
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence