Postgres queries

From WormBaseWiki
Revision as of 20:45, 28 January 2016 by Kyook (talk | contribs)
Jump to navigationJump to search

back to Caltech_documentation

Query page

postgres queries can be entered here

To make a query, you need to know the postgres table that carries the values you want.
Postgres table prefixes are listed in the third column in this wiki table.
More details of tables for specific datatypes are listed here.

A standard query follows this format (do not forget the ';' at the end of the query)

SELECT * FROM cur_transgene WHERE cur_transgene IS NOT NULL;

In this case, you are querying for any transgene value in the cur_transgene table where a value exists.

Queries

Find the number of curatable papers in the corpus - papers with the primary tag

SELECT COUNT (*) FROM pap_primary_data WHERE pap_primary_data = 'primary';

Find the number of papers published in a given year and in the WB corpus

SELECT COUNT(*) FROM pap_year WHERE pap_year = '2013';

Find the number of curatable papers in postgres for a given year

SELECT COUNT(*) FROM pap_year WHERE pap_year = '2015' AND joinkey IN (SELECT joinkey FROM pap_primary_data WHERE pap_primary_data = 'primary');

List values in two different OA tables for one line where values are not empty - map molecule ids to chebi ids

SELECT mop_chebi.mop_chebi, mop_name.mop_name FROM mop_chebi, mop_name WHERE mop_chebi.joinkey = mop_name.joinkey;

Find a paper based on a title:

SELECT * FROM pap_title WHERE pap_title ~ '"title "';

a script to find papers by case-insensitive substring search of the title:

on tazendra  /home/azurebrd/work/parsings/karen/20150925_witting_papers/simple_paper_title_search.pl

Show count for distinct papers curated for variation phenotypes:

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_variation);

Show all tables of a particular OA (in this case, the phenotype app tables):

SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app';

Same query as above however this one will filter out the history tables:

SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app' AND table_name ! ~ 'hst$';

Checking for duplicate values in interaction (int_) or transgene (trp_) name tables or transgene synonym tables

SELECT int_name, COUNT(*) AS count FROM int_name GROUP BY int_name HAVING COUNT(*) > 1;
SELECT trp_name, COUNT(*) AS count FROM trp_name GROUP BY trp_name HAVING COUNT(*) > 1;
SELECT trp_synonym, COUNT(*) AS count FROM trp_synonym GROUP BY trp_synonym HAVING COUNT(*) > 1;

Show pgids where there is no WBTransgeneID assigned

SELECT * FROM trp_curator WHERE joinkey NOT IN (SELECT joinkey FROM trp_name);

Display pgids where the value in trp_synonym is a blank (note this is different from being null).

SELECT * FROM trp_synonym WHERE trp_synonym  = ;

Find information for a variation in the obo_data_variation table (in this case for WBVar00087854).

SELECT * FROM obo_data_variation WHERE obo_data_variation ~ 'WBVar00087854';

Show values in a table where there is no 'null' or 'blank' values

SELECT * FROM trp_synonym WHERE trp_synonym IS NOT NULL AND trp_synonym != ;

The query for the list of topics (i.e., list of topics where the pgid is the same for process / paper / topicpaperstatus and the topicpaperstatus is either unchecked or relevant.)

SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE
pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND
(pro_topicpaperstatus.pro_topicpaperstatus = 'unchecked' OR pro_topicpaperstatus.pro_topicpaperstatus =
'relevant') ORDER BY pro_process.pro_process

The filter for the papers given a topic (unique WBPapers where the pgid is the same for process / paper / topicpaperstatus and the topicpaperstatus is either unchecked or relevant, and the process is the chosen topic)

SELECT DISTINCT(pro_paper.pro_paper) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE 
pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND
(pro_topicpaperstatus.pro_topicpaperstatus = 'unchecked' OR pro_topicpaperstatus.pro_topicpaperstatus =
'relevant') AND pro_process.pro_process = '$topic'

To see how data are typically formatted for a given Postgres table (e.g., phenotype in RNA tables)

SELECT * FROM rna_phenotype;

To see multiple value data for a specific table:

SELECT * FROM rna_phenotype WHERE rna_phenotype ~ ',';

Other queries

Show all variations that exist in the app_tables but not in obo_name_variation hence, they will not be dumped.

SELECT * FROM app_variation WHERE app_variation NOT IN (SELECT joinkey FROM obo_name_variation);

Show all transgenes in the phenotype tables that start with "WBPaper"

SELECT * FROM app_transgene WHERE app_transgene ~ 'WBPaper';

How many papers were curated for variation-phenotype

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp <
'2011-05-01' AND joinkey IN (SELECT joinkey FROM app_variation);

How many papers were curated for strain-phenotype

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp <
'2011-05-01' AND joinkey IN (SELECT joinkey FROM app_strain);

How many papers were curated for transgene-phenotype

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp <
'2011-05-01' AND joinkey IN (SELECT joinkey FROM app_transgene);

The number of genes represented in by all phenotype curation of variations for those same time points.

SELECT COUNT(DISTINCT app_wbgene) FROM app_wbgene WHERE app_timestamp
< '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_variation);

The number of genes represented in by all phenotype curation of transgenes for those same time points.

SELECT COUNT(DISTINCT app_wbgene) FROM app_wbgene WHERE app_timestamp
< '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_transgene);

How many variations have molecules?

SELECT COUNT(*) FROM app_variation WHERE joinkey IN (SELECT joinkey
FROM app_molecule);

Lines in the phenotype tables that are not dumped due to not being complete (happy); i.e., tagged with "down_right_disgusted".

SELECT * FROM app_curation_status WHERE app_curation_status != 'happy';

The values for the tempnames from that query

SELECT * FROM app_tempname WHERE joinkey IN (SELECT joinkey FROM app_tempname WHERE app_tempname !~ 'WBVar' AND joinkey IN (SELECT joinkey FROM app_type WHERE app_type = 'Allele'));

Entries in the phenotype tables that have a type 'Allele' and don't have a tempname that matches WBVar

SELECT * FROM app_tempname WHERE app_tempname !~ 'WBVar' AND joinkey IN (SELECT joinkey FROM app_type WHERE app_type = 'Allele');

Phenotype remarks in app tables by reverse timestamp

SELECT * FROM app_phen_remark ORDER BY app_timestamp DESC;