Difference between revisions of "Postgres queries"

From WormBaseWiki
Jump to navigationJump to search
(11 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
In this case, you are querying for any transgene value in the cur_transgene table where a value exists.
 
In this case, you are querying for any transgene value in the cur_transgene table where a value exists.
  
==Commonly used queries==
+
==Queries==
Show count for distinct papers curated for variation phenotypes
+
 
 +
List PGID and WBMolname where no ChEBI value exists (note: this query disregards if one WBMolname does have a chebi, which is enough for the WBBuild to display the chebi ID for a molecule)
 +
SELECT * FROM mop_name WHERE joinkey NOT IN (SELECT joinkey FROM mop_chebi);
 +
 
 +
List PGID and WBMolname where no ChEBI value exists for unique WBMol name
 +
SELECT * FROM mop_name WHERE mop_name NOT IN (SELECT mop_name FROM mop_name WHERE joinkey IN (SELECT joinkey FROM mop_chebi)) ORDER BY mop_name;
 +
the molecule objects where the objects are not in the molecule names that have a pgid, a pgid that has chebi data, sorted by molecule id
 +
 
 +
List PGID and molecule public name where no ChEBI value exists
 +
SELECT * FROM mop_publicname WHERE joinkey NOT IN (SELECT joinkey FROM mop_chebi);
 +
 
 +
Find (list) variations with a given phenotype, show phenotype, variation, and paper (query needs help)
 +
SELECT app_variation.app_variation, app_paper.app_paper FROM app_variation, app_paper WHERE app_variation.joinkey = app_paper.joinkey AND joinkey in (SELECT joinkey FROM app_term WHERE app_term = '<phenotype id>';
 +
 
 +
Find all papers given a specific journal name, such as "eLife" (case insensitive), and list the pubmed ID in the output
 +
SELECT pap_journal.joinkey, pap_identifier.pap_identifier, pap_journal.pap_journal FROM pap_journal, pap_identifier WHERE pap_journal.joinkey = pap_identifier.joinkey AND pap_journal.joinkey IN (SELECT joinkey FROM pap_journal WHERE pap_journal ~* 'elife') AND pap_identifier.pap_identifier ~ 'pmid';
 +
 
 +
Find the number of curatable papers in the corpus - papers with the primary tag
 +
SELECT COUNT (*) FROM pap_primary_data WHERE pap_primary_data = 'primary';
 +
 
 +
Find the number of papers published in a given year and in the WB corpus
 +
SELECT COUNT(*) FROM pap_year WHERE pap_year = '2013';
 +
 
 +
Find the number of curatable papers in postgres for a given year
 +
SELECT COUNT(*) FROM pap_year WHERE pap_year = '2015' AND joinkey IN (SELECT joinkey FROM pap_primary_data WHERE pap_primary_data = 'primary');
 +
 
 +
List values in two different OA tables for one line where values are not empty - map molecule ids to chebi ids
 +
SELECT mop_chebi.mop_chebi, mop_name.mop_name FROM mop_chebi, mop_name WHERE mop_chebi.joinkey = mop_name.joinkey;
 +
 
 +
Find a paper based on a title:
 +
SELECT * FROM pap_title WHERE pap_title ~ '"title "';
 +
a script to find papers by case-insensitive substring search of the title:
 +
on tazendra  /home/azurebrd/work/parsings/karen/20150925_witting_papers/simple_paper_title_search.pl
 +
 
 +
Show count for distinct papers curated for variation phenotypes:
 
  SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_variation);<br>
 
  SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_variation);<br>
 +
 +
Show all tables of a particular OA (in this case, the phenotype app tables):
 +
SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app';
 +
 +
Same query as above however this one will filter out the history tables:
 +
SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app' AND table_name !~ 'hst$';
  
 
Checking for duplicate values in interaction (int_) or transgene (trp_) name tables or transgene synonym tables
 
Checking for duplicate values in interaction (int_) or transgene (trp_) name tables or transgene synonym tables

Revision as of 23:24, 8 August 2020

back to Caltech_documentation

Query page

postgres queries can be entered here

To make a query, you need to know the postgres table that carries the values you want.
Postgres table prefixes are listed in the third column in this wiki table.
More details of tables for specific datatypes are listed here.

A standard query follows this format (do not forget the ';' at the end of the query)

SELECT * FROM cur_transgene WHERE cur_transgene IS NOT NULL;

In this case, you are querying for any transgene value in the cur_transgene table where a value exists.

Queries

List PGID and WBMolname where no ChEBI value exists (note: this query disregards if one WBMolname does have a chebi, which is enough for the WBBuild to display the chebi ID for a molecule)

SELECT * FROM mop_name WHERE joinkey NOT IN (SELECT joinkey FROM mop_chebi);

List PGID and WBMolname where no ChEBI value exists for unique WBMol name

SELECT * FROM mop_name WHERE mop_name NOT IN (SELECT mop_name FROM mop_name WHERE joinkey IN (SELECT joinkey FROM mop_chebi)) ORDER BY mop_name;
the molecule objects where the objects are not in the molecule names that have a pgid, a pgid that has chebi data, sorted by molecule id

List PGID and molecule public name where no ChEBI value exists

SELECT * FROM mop_publicname WHERE joinkey NOT IN (SELECT joinkey FROM mop_chebi);

Find (list) variations with a given phenotype, show phenotype, variation, and paper (query needs help)

SELECT app_variation.app_variation, app_paper.app_paper FROM app_variation, app_paper WHERE app_variation.joinkey = app_paper.joinkey AND joinkey in (SELECT joinkey FROM app_term WHERE app_term = '<phenotype id>';

Find all papers given a specific journal name, such as "eLife" (case insensitive), and list the pubmed ID in the output

SELECT pap_journal.joinkey, pap_identifier.pap_identifier, pap_journal.pap_journal FROM pap_journal, pap_identifier WHERE pap_journal.joinkey = pap_identifier.joinkey AND pap_journal.joinkey IN (SELECT joinkey FROM pap_journal WHERE pap_journal ~* 'elife') AND pap_identifier.pap_identifier ~ 'pmid';

Find the number of curatable papers in the corpus - papers with the primary tag

SELECT COUNT (*) FROM pap_primary_data WHERE pap_primary_data = 'primary';

Find the number of papers published in a given year and in the WB corpus

SELECT COUNT(*) FROM pap_year WHERE pap_year = '2013';

Find the number of curatable papers in postgres for a given year

SELECT COUNT(*) FROM pap_year WHERE pap_year = '2015' AND joinkey IN (SELECT joinkey FROM pap_primary_data WHERE pap_primary_data = 'primary');

List values in two different OA tables for one line where values are not empty - map molecule ids to chebi ids

SELECT mop_chebi.mop_chebi, mop_name.mop_name FROM mop_chebi, mop_name WHERE mop_chebi.joinkey = mop_name.joinkey;

Find a paper based on a title:

SELECT * FROM pap_title WHERE pap_title ~ '"title "';

a script to find papers by case-insensitive substring search of the title:

on tazendra  /home/azurebrd/work/parsings/karen/20150925_witting_papers/simple_paper_title_search.pl

Show count for distinct papers curated for variation phenotypes:

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_variation);

Show all tables of a particular OA (in this case, the phenotype app tables):

SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app';

Same query as above however this one will filter out the history tables:

SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_name ~ '^app' AND table_name !~ 'hst$';

Checking for duplicate values in interaction (int_) or transgene (trp_) name tables or transgene synonym tables

SELECT int_name, COUNT(*) AS count FROM int_name GROUP BY int_name HAVING COUNT(*) > 1;
SELECT trp_name, COUNT(*) AS count FROM trp_name GROUP BY trp_name HAVING COUNT(*) > 1;
SELECT trp_synonym, COUNT(*) AS count FROM trp_synonym GROUP BY trp_synonym HAVING COUNT(*) > 1;

Show pgids where there is no WBTransgeneID assigned

SELECT * FROM trp_curator WHERE joinkey NOT IN (SELECT joinkey FROM trp_name);

Display pgids where the value in trp_synonym is a blank (note this is different from being null).

SELECT * FROM trp_synonym WHERE trp_synonym  = ;

Find information for a variation in the obo_data_variation table (in this case for WBVar00087854).

SELECT * FROM obo_data_variation WHERE obo_data_variation ~ 'WBVar00087854';

Show values in a table where there is no 'null' or 'blank' values

SELECT * FROM trp_synonym WHERE trp_synonym IS NOT NULL AND trp_synonym != ;

The query for the list of topics (i.e., list of topics where the pgid is the same for process / paper / topicpaperstatus and the topicpaperstatus is either unchecked or relevant.)

SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE
pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND
(pro_topicpaperstatus.pro_topicpaperstatus = 'unchecked' OR pro_topicpaperstatus.pro_topicpaperstatus =
'relevant') ORDER BY pro_process.pro_process

The filter for the papers given a topic (unique WBPapers where the pgid is the same for process / paper / topicpaperstatus and the topicpaperstatus is either unchecked or relevant, and the process is the chosen topic)

SELECT DISTINCT(pro_paper.pro_paper) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE 
pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND
(pro_topicpaperstatus.pro_topicpaperstatus = 'unchecked' OR pro_topicpaperstatus.pro_topicpaperstatus =
'relevant') AND pro_process.pro_process = '$topic'

To see how data are typically formatted for a given Postgres table (e.g., phenotype in RNA tables)

SELECT * FROM rna_phenotype;

To see multiple value data for a specific table:

SELECT * FROM rna_phenotype WHERE rna_phenotype ~ ',';

Other queries

Show all variations that exist in the app_tables but not in obo_name_variation hence, they will not be dumped.

SELECT * FROM app_variation WHERE app_variation NOT IN (SELECT joinkey FROM obo_name_variation);

Show all transgenes in the phenotype tables that start with "WBPaper"

SELECT * FROM app_transgene WHERE app_transgene ~ 'WBPaper';

How many papers were curated for variation-phenotype

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp <
'2011-05-01' AND joinkey IN (SELECT joinkey FROM app_variation);

How many papers were curated for strain-phenotype

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp <
'2011-05-01' AND joinkey IN (SELECT joinkey FROM app_strain);

How many papers were curated for transgene-phenotype

SELECT COUNT(DISTINCT app_paper) FROM app_paper WHERE app_timestamp <
'2011-05-01' AND joinkey IN (SELECT joinkey FROM app_transgene);

The number of genes represented in by all phenotype curation of variations for those same time points.

SELECT COUNT(DISTINCT app_wbgene) FROM app_wbgene WHERE app_timestamp
< '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_variation);

The number of genes represented in by all phenotype curation of transgenes for those same time points.

SELECT COUNT(DISTINCT app_wbgene) FROM app_wbgene WHERE app_timestamp
< '2011-05-01' AND joinkey IN (SELECT joinkey FROM app_transgene);

How many variations have molecules?

SELECT COUNT(*) FROM app_variation WHERE joinkey IN (SELECT joinkey
FROM app_molecule);

Lines in the phenotype tables that are not dumped due to not being complete (happy); i.e., tagged with "down_right_disgusted".

SELECT * FROM app_curation_status WHERE app_curation_status != 'happy';

The values for the tempnames from that query

SELECT * FROM app_tempname WHERE joinkey IN (SELECT joinkey FROM app_tempname WHERE app_tempname !~ 'WBVar' AND joinkey IN (SELECT joinkey FROM app_type WHERE app_type = 'Allele'));

Entries in the phenotype tables that have a type 'Allele' and don't have a tempname that matches WBVar

SELECT * FROM app_tempname WHERE app_tempname !~ 'WBVar' AND joinkey IN (SELECT joinkey FROM app_type WHERE app_type = 'Allele');

Phenotype remarks in app tables by reverse timestamp

SELECT * FROM app_phen_remark ORDER BY app_timestamp DESC;