Difference between revisions of "UniProt Paper - Gene - Data Type"

From WormBaseWiki
Jump to navigationJump to search
m
Line 33: Line 33:
  
 
Expression:
 
Expression:
* File: anatomy_association.WS270.wb
+
* File: anatomy_association.WSnnn.wb
 
* File: development_association.WS270.wb
 
* File: development_association.WS270.wb
 
* Convert WBPaper ids into PMID
 
* Convert WBPaper ids into PMID
 
* Relevant columns: 2 and 6
 
* Relevant columns: 2 and 6
 
* Remove redundant gene-paper associations
 
* Remove redundant gene-paper associations
 +
 +
Disease:
 +
* File: disease_association.WSnnn.wb
 +
* Convert WBPaper ids into PMID
 +
* Ignore lines with 'IEA' in column 7
 +
* Relevant columns: 2 and 6
  
 
GO:
 
GO:
Line 45: Line 51:
 
* Only use annotation lines that include a PMID, but ignore lines with WB_REF:WBPaper00046480|PMID:21873635
 
* Only use annotation lines that include a PMID, but ignore lines with WB_REF:WBPaper00046480|PMID:21873635
 
* Relevant columns: 2 and 6
 
* Relevant columns: 2 and 6
 
  
 
Phenotype:
 
Phenotype:
Line 53: Line 58:
 
* Convert WBPaper ids into PMID
 
* Convert WBPaper ids into PMID
 
* Relevant columns: 2 and 6
 
* Relevant columns: 2 and 6
 
 
  
  
Line 60: Line 63:
 
      
 
      
  
 
 
 
 
Expression:
 
    ?Expr_pattern -> Expression_of -> Gene
 
                  -> Reference
 
  
  
Line 72: Line 68:
 
     ?Variation -> Affects -> Gene
 
     ?Variation -> Affects -> Gene
 
                 -> Type_of_mutation -> Paper_evidence
 
                 -> Type_of_mutation -> Paper_evidence
           
 
Disease:
 
    ?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence
 
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence
 
 
Disease:
 
    Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod
 
                                      Disease relevance and Paper for Disease Rel
 
    Will need to get OA table names
 
  
 
= 2015 Pipeline =
 
= 2015 Pipeline =

Revision as of 20:47, 20 May 2019

2019 Pipeline

Objective: supply a regularly updated file to UniProt that lists:

WBGene WBPaperID PMID Category


The Categories would be gene-specific and we will supply information for:


Expression

GO

Phenotype

PPI (Protein-Protein Interaction)

Disease

Sequence

New pipeline proposal is to retrieve information from WormBase ftp site, which will be updated with each release.

ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release


From: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY

General principles:

  1. ignore lines with NOT qualifier (column 4 in GO, Phenotype
  2. convert WB_REFs to PMIDs ?? will check with Ceci

Expression:

  • File: anatomy_association.WSnnn.wb
  • File: development_association.WS270.wb
  • Convert WBPaper ids into PMID
  • Relevant columns: 2 and 6
  • Remove redundant gene-paper associations

Disease:

  • File: disease_association.WSnnn.wb
  • Convert WBPaper ids into PMID
  • Ignore lines with 'IEA' in column 7
  • Relevant columns: 2 and 6

GO:

  • File: gene_association.WSnnn.wb.c_elegans
  • Ignore lines with 'NOT' in column 4
  • Only use annotation lines that include a PMID, but ignore lines with WB_REF:WBPaper00046480|PMID:21873635
  • Relevant columns: 2 and 6

Phenotype:

  • File: phenotype_association.WSnnn.wb
  • Ignore lines with 'NOT' in column 4
  • Convert WBPaper ids into PMID
  • Relevant columns: 2 and 6


PPI:



Sequence:

    ?Variation -> Affects -> Gene
               -> Type_of_mutation -> Paper_evidence

2015 Pipeline

Original file generated for UniProt:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi

What we currently supply:

WBGene WBPaperID PMID

Not sure how this is generated; I believe this was done before my time as paper curator.


Proposed updates to file will now include data types curated for a given gene.

We would need to add:

WBGene WBPaperID PMID Category


The Categories would be gene-specific and we will supply information for:

GO

PPI (Protein-Protein Interaction)

Phenotype

Disease

Expression

Sequence


An example:

    WBGene00003508  WBPaper00003680  pmid10517638  GO;Phenotype;Disease;Expression 



Strategy: Several possible strategies, perhaps - not sure which is best.

Easiest to get everything from WS or a mixture of WS and postgres?

Some things, like GO, RNAi and Variation Phenotypes, need to be from WS

Possibilities:

1) Start with Paper object and then trace the information in the objects xref'ed in the Refers_to tag - this works for everything but Disease

2) Look at each object in each relevant class - this seems computationally very intensive


Relevant tags in the different object models:


GO:

    ?GO_annotation -> Gene
                   -> Reference


PPI:

    ?Interaction -> Interaction_type Physical
                 -> Interactor_overlapping_gene
                 -> Paper


Phenotype:

    ?RNAi -> Inhibits -> Gene
          -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
          -> Reference
    ?Variation -> Affects -> Gene
               -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
               -> Reference


Expression:

    ?Expr_pattern -> Expression_of -> Gene
                  -> Reference


Sequence:

    ?Variation -> Affects -> Gene
               -> Type_of_mutation -> Paper_evidence
            

Disease:

    ?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence

Disease:

    Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod
                                     Disease relevance and Paper for Disease Rel
    Will need to get OA table names