Difference between revisions of "UniProt Paper - Gene - Data Type"

From WormBaseWiki
Jump to navigationJump to search
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
= 2019 Pipeline =
 +
 +
Objective: supply a regularly updated file to UniProt that lists:
 +
 +
WBGene WBPaperID PMID Category
 +
 +
 +
The Categories would be gene-specific and we will supply information for:
 +
 +
 +
Expression
 +
 +
GO
 +
 +
Phenotype
 +
 +
PPI (Protein-Protein Interaction)
 +
 +
Disease
 +
 +
Sequence
 +
 +
New pipeline proposal is to retrieve information from WormBase ftp site, which will be updated with each release.
 +
 +
ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release
 +
 +
 +
From: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY
 +
 +
General principles:
 +
# ignore lines with NOT qualifier
 +
# convert WB_REFs to PMIDs ??  will check with Ceci
 +
 +
Expression:
 +
* File: anatomy_association.WSnnn.wb
 +
* File: development_association.WS270.wb
 +
* Convert WBPaper ids into PMID
 +
* Relevant columns: 2 and 6
 +
* Remove redundant gene-paper associations
 +
 +
Disease:
 +
* File: disease_association.WSnnn.wb
 +
* Convert WBPaper ids into PMID
 +
* Ignore lines with 'IEA' in column 7
 +
* Relevant columns: 2 and 6
 +
 +
GO:
 +
* File: gene_association.WSnnn.wb.c_elegans
 +
* Ignore lines with 'NOT' in column 4
 +
* Only use annotation lines that include a PMID, but ignore lines with WB_REF:WBPaper00046480|PMID:21873635
 +
* Relevant columns: 2 and 6
 +
 +
Phenotype:
 +
* File: phenotype_association.WSnnn.wb
 +
* Ignore lines with 'NOT' in column 4
 +
* Convert WBPaper ids into PMID
 +
* Relevant columns: 2 and 6
 +
 +
From: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/annotation/
 +
 +
PPI:
 +
* Problem with this file: papers are listed as text, not IDs
 +
* Check with Chris G. about another file?
 +
* May need to dump from postgres
 +
* Need type of interaction (physical AND proteinprotein), reference, and each gene listed
 +
 +
From: ?? need to email Hinxton
 +
 +
Sequence:
 +
    ?Variation -> Affects -> Gene
 +
                -> Type_of_mutation -> Paper_evidence
 +
 +
= 2015 Pipeline =
 
Original file generated for UniProt:
 
Original file generated for UniProt:
  
Line 67: Line 140:
  
 
Phenotype:
 
Phenotype:
     ?RNAi -> Inhibits
+
     ?RNAi -> Inhibits -> Gene
           -> Phenotype
+
           -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
 
           -> Reference
 
           -> Reference
  
     ?Variation -> Affects
+
     ?Variation -> Affects -> Gene
                 -> Phenotype
+
                 -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
 
                 -> Reference
 
                 -> Reference
  
Line 82: Line 155:
  
 
Sequence:
 
Sequence:
     ?Variation -> Affects
+
     ?Variation -> Affects -> Gene
 
+
                 -> Type_of_mutation -> Paper_evidence
                 -> Nonsense
+
           
                -> Missense
+
Disease:
                -> Silent Any one of these filled in
+
    ?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence
                -> Splice_site
+
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence
                -> Frameshift
 
                -> Readthrough
 
 
 
                -> Reference
 
  
 
Disease:
 
Disease:
     ?Gene -> Disease_info -> Experimental -> Evidence -> Paper_evidence
+
     Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence
+
                                      Disease relevance and Paper for Disease Rel
 +
    Will need to get OA table names

Latest revision as of 12:45, 21 May 2019

2019 Pipeline

Objective: supply a regularly updated file to UniProt that lists:

WBGene WBPaperID PMID Category


The Categories would be gene-specific and we will supply information for:


Expression

GO

Phenotype

PPI (Protein-Protein Interaction)

Disease

Sequence

New pipeline proposal is to retrieve information from WormBase ftp site, which will be updated with each release.

ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release


From: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY

General principles:

  1. ignore lines with NOT qualifier
  2. convert WB_REFs to PMIDs ?? will check with Ceci

Expression:

  • File: anatomy_association.WSnnn.wb
  • File: development_association.WS270.wb
  • Convert WBPaper ids into PMID
  • Relevant columns: 2 and 6
  • Remove redundant gene-paper associations

Disease:

  • File: disease_association.WSnnn.wb
  • Convert WBPaper ids into PMID
  • Ignore lines with 'IEA' in column 7
  • Relevant columns: 2 and 6

GO:

  • File: gene_association.WSnnn.wb.c_elegans
  • Ignore lines with 'NOT' in column 4
  • Only use annotation lines that include a PMID, but ignore lines with WB_REF:WBPaper00046480|PMID:21873635
  • Relevant columns: 2 and 6

Phenotype:

  • File: phenotype_association.WSnnn.wb
  • Ignore lines with 'NOT' in column 4
  • Convert WBPaper ids into PMID
  • Relevant columns: 2 and 6

From: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/annotation/

PPI:

  • Problem with this file: papers are listed as text, not IDs
  • Check with Chris G. about another file?
  • May need to dump from postgres
  • Need type of interaction (physical AND proteinprotein), reference, and each gene listed

From: ?? need to email Hinxton

Sequence:

    ?Variation -> Affects -> Gene
               -> Type_of_mutation -> Paper_evidence

2015 Pipeline

Original file generated for UniProt:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi

What we currently supply:

WBGene WBPaperID PMID

Not sure how this is generated; I believe this was done before my time as paper curator.


Proposed updates to file will now include data types curated for a given gene.

We would need to add:

WBGene WBPaperID PMID Category


The Categories would be gene-specific and we will supply information for:

GO

PPI (Protein-Protein Interaction)

Phenotype

Disease

Expression

Sequence


An example:

    WBGene00003508  WBPaper00003680  pmid10517638  GO;Phenotype;Disease;Expression 



Strategy: Several possible strategies, perhaps - not sure which is best.

Easiest to get everything from WS or a mixture of WS and postgres?

Some things, like GO, RNAi and Variation Phenotypes, need to be from WS

Possibilities:

1) Start with Paper object and then trace the information in the objects xref'ed in the Refers_to tag - this works for everything but Disease

2) Look at each object in each relevant class - this seems computationally very intensive


Relevant tags in the different object models:


GO:

    ?GO_annotation -> Gene
                   -> Reference


PPI:

    ?Interaction -> Interaction_type Physical
                 -> Interactor_overlapping_gene
                 -> Paper


Phenotype:

    ?RNAi -> Inhibits -> Gene
          -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
          -> Reference
    ?Variation -> Affects -> Gene
               -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
               -> Reference


Expression:

    ?Expr_pattern -> Expression_of -> Gene
                  -> Reference


Sequence:

    ?Variation -> Affects -> Gene
               -> Type_of_mutation -> Paper_evidence
            

Disease:

    ?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence

Disease:

    Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod
                                     Disease relevance and Paper for Disease Rel
    Will need to get OA table names