Difference between revisions of "UniProt Paper - Gene - Data Type"
m (→2019 Pipeline) |
m |
||
Line 7: | Line 7: | ||
The Categories would be gene-specific and we will supply information for: | The Categories would be gene-specific and we will supply information for: | ||
+ | |||
+ | |||
+ | Expression | ||
GO | GO | ||
Line 15: | Line 18: | ||
Disease | Disease | ||
− | |||
− | |||
Sequence | Sequence | ||
Line 31: | Line 32: | ||
# convert WB_REFs to PMIDs ?? will check with Ceci | # convert WB_REFs to PMIDs ?? will check with Ceci | ||
+ | Expression: | ||
+ | * File: anatomy_association.WS270.wb | ||
+ | * File: development_association.WS270.wb | ||
+ | * Convert WBPaper ids into PMID | ||
+ | * Relevant columns: 2 and 6 | ||
+ | * Remove redundant gene-paper associations | ||
GO: | GO: |
Revision as of 20:40, 20 May 2019
2019 Pipeline
Objective: supply a regularly updated file to UniProt that lists:
WBGene WBPaperID PMID Category
The Categories would be gene-specific and we will supply information for:
Expression
GO
Phenotype
PPI (Protein-Protein Interaction)
Disease
Sequence
New pipeline proposal is to retrieve information from WormBase ftp site, which will be updated with each release.
ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release
From: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/ONTOLOGY
General principles:
- ignore lines with NOT qualifier (column 4 in GO, Phenotype
- convert WB_REFs to PMIDs ?? will check with Ceci
Expression:
- File: anatomy_association.WS270.wb
- File: development_association.WS270.wb
- Convert WBPaper ids into PMID
- Relevant columns: 2 and 6
- Remove redundant gene-paper associations
GO:
- File: gene_association.WSnnn.wb.c_elegans
- Ignore lines with 'NOT' in column 4
- Only use annotation lines that include a PMID, but ignore lines with WB_REF:WBPaper00046480|PMID:21873635
- Relevant columns: 2 and 6
Phenotype:
- File: phenotype_association.WSnnn.wb
- Ignore lines with 'NOT' in column 4
- Convert WBPaper ids into PMID
- Relevant columns: 2 and 6
PPI:
Expression:
?Expr_pattern -> Expression_of -> Gene -> Reference
Sequence:
?Variation -> Affects -> Gene -> Type_of_mutation -> Paper_evidence
Disease:
?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence
Disease:
Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod Disease relevance and Paper for Disease Rel Will need to get OA table names
2015 Pipeline
Original file generated for UniProt:
http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/uniprot.cgi
What we currently supply:
WBGene WBPaperID PMID
Not sure how this is generated; I believe this was done before my time as paper curator.
Proposed updates to file will now include data types curated for a given gene.
We would need to add:
WBGene WBPaperID PMID Category
The Categories would be gene-specific and we will supply information for:
GO
PPI (Protein-Protein Interaction)
Phenotype
Disease
Expression
Sequence
An example:
WBGene00003508 WBPaper00003680 pmid10517638 GO;Phenotype;Disease;Expression
Strategy: Several possible strategies, perhaps - not sure which is best.
Easiest to get everything from WS or a mixture of WS and postgres?
Some things, like GO, RNAi and Variation Phenotypes, need to be from WS
Possibilities:
1) Start with Paper object and then trace the information in the objects xref'ed in the Refers_to tag - this works for everything but Disease
2) Look at each object in each relevant class - this seems computationally very intensive
Relevant tags in the different object models:
GO:
?GO_annotation -> Gene -> Reference
PPI:
?Interaction -> Interaction_type Physical -> Interactor_overlapping_gene -> Paper
Phenotype:
?RNAi -> Inhibits -> Gene -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is) -> Reference
?Variation -> Affects -> Gene -> Phenotype (Only Phenotype Observed, doesn't matter what the Phenotype is) -> Reference
Expression:
?Expr_pattern -> Expression_of -> Gene -> Reference
Sequence:
?Variation -> Affects -> Gene -> Type_of_mutation -> Paper_evidence
Disease:
?Gene -> Disease_info -> Experimental_model -> Evidence -> Paper_evidence -> Disease_info -> Disease_relevance -> Evidence -> Paper_evidence
Disease:
Alternatively, use OA tables for Experimental Model For and Paper for Exp Mod Disease relevance and Paper for Disease Rel Will need to get OA table names