2019 Pipeline

Objective: supply a regularly updated file to UniProt that lists:

WBGene WBPaperID PMID Category

The Categories would be gene-specific and we will supply information for:

GO

PPI (Protein-Protein Interaction)

Phenotype

Disease

Expression

Sequence

New pipeline proposal is to retrieve information from WormBase ftp site, which will be updated with each release.

ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release

From: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY

General principles:

ignore lines with NOT qualifier (column 4 in GO, Phenotype
convert WB_REFs to PMIDs ?? will check with Ceci

GO:

File: gene_association.WSnnn.wb.c_elegans
Ignore lines with 'NOT' in column 4
Only use annotation lines that include a PMID, but ignore lines with WB_REF:WBPaper00046480|PMID:21873635
Relevant columns: 2 and 6

Phenotype:

File: phenotype_association.WSnnn.wb
Ignore lines with 'NOT' in column 4
Convert WBPaper ids into PMID
Relevant columns: 2 and 6

PPI:

Expression:

    ?Expr_pattern -> Expression_of -> Gene
                  -> Reference

Sequence:

    ?Variation -> Affects -> Gene
               -> Type_of_mutation -> Paper_evidence

Disease:

    ?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence

Disease:

    Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod
                                     Disease relevance and Paper for Disease Rel
    Will need to get OA table names

2015 Pipeline

Original file generated for UniProt:

http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi

What we currently supply:

WBGene WBPaperID PMID

Not sure how this is generated; I believe this was done before my time as paper curator.

Proposed updates to file will now include data types curated for a given gene.

We would need to add:

WBGene WBPaperID PMID Category

The Categories would be gene-specific and we will supply information for:

GO

PPI (Protein-Protein Interaction)

Phenotype

Disease

Expression

Sequence

An example:

    WBGene00003508  WBPaper00003680  pmid10517638  GO;Phenotype;Disease;Expression

Strategy: Several possible strategies, perhaps - not sure which is best.

Easiest to get everything from WS or a mixture of WS and postgres?

Some things, like GO, RNAi and Variation Phenotypes, need to be from WS

Possibilities:

1) Start with Paper object and then trace the information in the objects xref'ed in the Refers_to tag - this works for everything but Disease

2) Look at each object in each relevant class - this seems computationally very intensive

Relevant tags in the different object models:

GO:

    ?GO_annotation -> Gene
                   -> Reference

PPI:

    ?Interaction -> Interaction_type Physical
                 -> Interactor_overlapping_gene
                 -> Paper

Phenotype:

    ?RNAi -> Inhibits -> Gene
          -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
          -> Reference

    ?Variation -> Affects -> Gene
               -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is)
               -> Reference

Expression:

    ?Expr_pattern -> Expression_of -> Gene
                  -> Reference

Sequence:

    ?Variation -> Affects -> Gene
               -> Type_of_mutation -> Paper_evidence

Disease:

    ?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence
          -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence

Disease:

    Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod
                                     Disease relevance and Paper for Disease Rel
    Will need to get OA table names

UniProt Paper - Gene - Data Type

2019 Pipeline

2015 Pipeline

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools