Postgres queries
back to Caltech_documentation
Contents
Query page
postgres queries can be entered here
To make a query, you need to know the postgres table that carries the values you want.
Postgres table prefixes are listed in the third column in this wiki table.
More details of tables for specific datatypes are listed here.
A standard query follows this format (do not forget the ';' at the end of the query)
SELECT * FROM cur_transgene WHERE cur_transgene IS NOT NULL;
In this case, you are querying for any transgene value in the cur_transgene table where a value exists.
Queries
Bioentity Lexica postgres queries
Get data from Postgres for making a Lexicon for article linking.
- Start with a joinkey (pgid), then each of the tables dot their column (same name) from the tables where each table.joinkey equals another table.joinkey, so they all refer to the same joinkey.
- Transgenes
SELECT trp_name.joinkey, trp_name.trp_name, trp_publicname.trp_publicname, trp_synonym.trp_synonym, trp_summary.trp_summary FROM trp_name, trp_publicname, trp_synonym, trp_summary WHERE trp_name.joinkey = trp_publicname.joinkey AND trp_synonym.joinkey = trp_name.joinkey AND trp_summary.joinkey = trp_name.joinkey ; This will return table with columns pgid | WBTransgeneID | Public Name | Synonym | Genotype Remove the first column and save as tsv file to upload into Bioentity.link
- Molecules
SELECT mop_name.joinkey, mop_name.mop_name, mop_publicname.mop_publicname, mop_synonym.mop_synonym, mop_molecule.mop_molecule, mop_chebi.mop_chebi, mop_chemi.mop_chemi FROM mop_name, mop_publicname, mop_synonym, mop_molecule, mop_chebi, mop_chemi WHERE mop_name.joinkey = mop_publicname.joinkey AND mop_synonym.joinkey = mop_name.joinkey AND mop_molecule.joinkey = mop_name.joinkey AND mop_chebi.joinkey = mop_name.joinkey AND mop_chemi.joinkey = mop_name.joinkey ;
This will return table with columns pgid | WBMolID | Public Name | Synonym | MeshID | ChEBI ID | Chemi ID Remove the first column, combine last three columns and save as tsv file to upload into Bioentity.link
- Variations
SELECT * FROM obo_data_variation
WBVar00270471 id: WBVar00270471 name: "WBVar00270471" other_name: "cewivar00691981" other_name: "uCE4-800" species: "Caenorhabditis elegans" status: "Live" reference: "WBPaper00005369" gene: "WBGene00006869 vab-2" 2024-08-11 20:00:18-07 WBVar00270472 id: WBVar00270472 name: "WBVar00270472" other_name: "uCE4-805" species: "Caenorhabditis elegans" status: "Live" reference: "WBPaper00005369" gene: "WBGene00019546 glct-6" 2024-08-11 20:00:18-07
Use a text editor and or spreadsheet to form the appropriate columns - keep in mind if separating the gene name into its own spreadsheet column, it could be altered into a date
- Strains
SELECT * FROM obo_data_strain
WBStrain00000001 id: WBStrain00000001 name: "N2" summary: "Caenorhabditis elegans wild isolate." location: "CGC" wild_isolate: "Wild_isolate" species: "Caenorhabditis elegans" 2024-08-11 20:00:18-07 WBStrain00000002 id: WBStrain00000002 name: "VC2010" summary: "Caenorhabditis elegans wild isolate." location: "CGC" location: "VC" species: "Caenorhabditis elegans" wild_isolate: "Wild_isolate" 2024-08-11 20:00:18-07 Use a text editor and or spreadsheet to form the appropriate columns
Other queries
Heat sensitive alleles (from help desk question, answer from Chris Jul 8, 2022, 2:56 PM:
SELECT distinct app_variation FROM app_variation WHERE joinkey IN (SELECT joinkey FROM app_heat_sens) AND joinkey NOT IN (SELECT joinkey FROM app_nodump) AND joinkey NOT IN (SELECT joinkey FROM app_not);
Cold sensitive alleles:
SELECT distinct app_variation FROM app_variation WHERE joinkey IN (SELECT joinkey FROM app_cold_sens) AND joinkey NOT IN (SELECT joinkey FROM app_nodump) AND joinkey NOT IN (SELECT joinkey FROM app_not);
Corresponding acedb queries: Heat sensitive alleles: select l from l in class variation where exists l->phenotype[heat_sensitive] Cold sensitive alleles: select l from l in class variation where exists l->phenotype[cold_sensitive]
List PGID and WBMolname where no ChEBI value exists (note: this query disregards if one WBMolname does have a chebi, which is enough for the WBBuild to display the chebi ID for a molecule)
SELECT * FROM mop_name WHERE joinkey NOT IN (SELECT joinkey FROM mop_chebi);
List PGID and WBMolname where no ChEBI value exists for unique WBMol name
SELECT * FROM mop_name WHERE mop_name NOT IN (SELECT mop_name FROM mop_name WHERE joinkey IN (SELECT joinkey FROM mop_chebi)) ORDER BY mop_name; the molecule objects where the objects are not in the molecule names that have a pgid, a pgid that has chebi data, sorted by molecule id
List PGID and molecule public name where no ChEBI value exists
SELECT * FROM mop_publicname WHERE joinkey NOT IN (SELECT joinkey FROM mop_chebi);
Find (list) variations with a given phenotype, show phenotype, variation, and paper (query needs help)
SELECT app_variation.app_variation, app_paper.app_paper FROM app_variation, app_paper WHERE app_variation.joinkey = app_paper.joinkey AND joinkey in (SELECT joinkey FROM app_term WHERE app_term = '<phenotype id>';
Find all papers given a specific journal name, such as "eLife" (case insensitive), and list the pubmed ID in the output
SELECT pap_journal.joinkey, pap_identifier.pap_identifier, pap_journal.pap_journal FROM pap_journal, pap_identifier WHERE pap_journal.joinkey = pap_identifier.joinkey AND pap_journal.joinkey IN (SELECT joinkey FROM pap_journal WHERE pap_journal ~* 'elife') AND pap_identifier.pap_identifier ~ 'pmid';
Find the number of curatable papers in the corpus - papers with the primary tag
SELECT COUNT (*) FROM pap_primary_data WHERE pap_primary_data = 'primary';
Find the number of papers published in a given year and in the WB corpus
SELECT COUNT(*) FROM pap_year WHERE pap_year = '2013';
Find the number of curatable papers in postgres for a given year
SELECT COUNT(*) FROM pap_year WHERE pap_year = '2015' AND joinkey IN (SELECT joinkey FROM pap_primary_data WHERE pap_primary_data = 'primary');
List values in two different OA tables for one line where values are not empty - map molecule ids to chebi ids
SELECT mop_chebi.mop_chebi, mop_name.mop_name FROM mop_chebi, mop_name WHERE mop_chebi.joinkey = mop_name.joinkey;
Find a paper based on a title:
SELECT * FROM pap_title WHERE pap_title ~ '"title "';
a script to find papers by case-insensitive substring search of the title:
on tazendra /home/azurebrd/work/parsings/karen/20150925_witting_papers/simple_paper_title_search.pl
Show count for distinct papers curated for variation phenotypes:
SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_variation);
Show all tables of a particular OA (in this case, the phenotype app tables):
SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app';
Same query as above however this one will filter out the history tables:
SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app' AND table_name !~ 'hst$';
Checking for duplicate values in interaction (int_) or transgene (trp_) name tables or transgene synonym tables
SELECT int_name, COUNT(*) AS count FROM int_name GROUP BY int_name HAVING COUNT(*) > 1;
SELECT trp_name, COUNT(*) AS count FROM trp_name GROUP BY trp_name HAVING COUNT(*) > 1;
SELECT trp_synonym, COUNT(*) AS count FROM trp_synonym GROUP BY trp_synonym HAVING COUNT(*) > 1;
Show pgids where there is no WBTransgeneID assigned
SELECT * FROM trp_curator WHERE joinkey NOT IN (SELECT joinkey FROM trp_name);
Display pgids where the value in trp_synonym is a blank (note this is different from being null).
SELECT * FROM trp_synonym WHERE trp_synonym = ;
Find information for a variation in the obo_data_variation table (in this case for WBVar00087854).
SELECT * FROM obo_data_variation WHERE obo_data_variation ~ 'WBVar00087854';
Show values in a table where there is no 'null' or 'blank' values
SELECT * FROM trp_synonym WHERE trp_synonym IS NOT NULL AND trp_synonym != ;
The query for the list of topics (i.e., list of topics where the pgid is the same for process / paper / topicpaperstatus and the topicpaperstatus is either unchecked or relevant.)
SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'unchecked' OR pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') ORDER BY pro_process.pro_process
The filter for the papers given a topic (unique WBPapers where the pgid is the same for process / paper / topicpaperstatus and the topicpaperstatus is either unchecked or relevant, and the process is the chosen topic)
SELECT DISTINCT(pro_paper.pro_paper) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'unchecked' OR pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') AND pro_process.pro_process = '$topic'
To see how data are typically formatted for a given Postgres table (e.g., phenotype in RNA tables)
SELECT * FROM rna_phenotype;
To see multiple value data for a specific table:
SELECT * FROM rna_phenotype WHERE rna_phenotype ~ ',';
Other queries
Show all variations that exist in the app_tables but not in obo_name_variation hence, they will not be dumped.
SELECT * FROM app_variation WHERE app_variation NOT IN (SELECT joinkey FROM obo_name_variation);
Show all transgenes in the phenotype tables that start with "WBPaper"
SELECT * FROM app_transgene WHERE app_transgene ~ 'WBPaper';
How many papers were curated for variation-phenotype
SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp < '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_variation);
How many papers were curated for strain-phenotype
SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp < '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_strain);
How many papers were curated for transgene-phenotype
SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp < '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_transgene);
The number of genes represented in by all phenotype curation of variations for those same time points.
SELECT COUNT(DISTINCT app_wbgene) FROM app_wbgene WHERE app_timestamp < '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_variation);
The number of genes represented in by all phenotype curation of transgenes for those same time points.
SELECT COUNT(DISTINCT app_wbgene) FROM app_wbgene WHERE app_timestamp < '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_transgene);
How many variations have molecules?
SELECT COUNT(*) FROM app_variation WHERE joinkey IN (SELECT joinkey FROM app_molecule);
Lines in the phenotype tables that are not dumped due to not being complete (happy); i.e., tagged with "down_right_disgusted".
SELECT * FROM app_curation_status WHERE app_curation_status != 'happy'; The values for the tempnames from that query SELECT * FROM app_tempname WHERE joinkey IN (SELECT joinkey FROM app_tempname WHERE app_tempname !~ 'WBVar' AND joinkey IN (SELECT joinkey FROM app_type WHERE app_type = 'Allele'));
Entries in the phenotype tables that have a type 'Allele' and don't have a tempname that matches WBVar
SELECT * FROM app_tempname WHERE app_tempname !~ 'WBVar' AND joinkey IN (SELECT joinkey FROM app_type WHERE app_type = 'Allele');
Phenotype remarks in app tables by reverse timestamp
SELECT * FROM app_phen_remark ORDER BY app_timestamp DESC;