I need to analyze survey data (stored in records) where a question can have a choice of options. My goal is to identify the answers given that were NOT within the range of allowed options for this question. However, my query returns everything (I suspect a subquery) and I don't know how to fix it.
Schema
The records stores its data in the data
JSONB column. There, the keys are question
UIDs, e.g. uid00000006
has the answer option1
. option1
is a choice to select.
(Not all questions need to have a dropdown, so some other value is fine such as 42.)
{"uid00000006": {"value": "option1"}, "uid00000008": {"value": 42}}
A question
optionally has a reference to a optionset
(the dropdown) which has a range of optionvalues
(the values of the dropdown) , e.g. option1
, option2
, option3
etc.
create table record
(
recordid bigint not null primary key,
uid varchar(11) unique,
data jsonb default '{}'::jsonb not null
);
create table question
(
questionid bigint not null primary key,
uid varchar(11) not null unique,
optionsetid bigint
);
create table optionset
(
optionsetid bigint not null primary key,
uid varchar(11) not null unique
);
create table optionvalue
(
optionvalueid bigint not null primary key,
uid varchar(11) not null unique,
code varchar(230) not null,
optionsetid bigint
);
-- create optionset
INSERT INTO optionset (optionsetid, uid) VALUES (1, 'uid00000001');
-- insert optionvalues into optionset
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (100, 'uid00000002', 'option1', 1);
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (101, 'uid00000003', 'option2', 1);
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (102, 'uid00000004', 'option3', 1);
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (103, 'uid00000005', 'option4', 1);
-- insert questions
INSERT INTO question (questionid, uid, optionsetid) VALUES (1001, 'uid00000006', 1);
INSERT INTO question (questionid, uid, optionsetid) VALUES (1002, 'uid00000007', 1);
INSERT INTO question (questionid, uid, optionsetid) VALUES (1003, 'uid00000008', NULL);
-- insert records
INSERT INTO record (recordid, uid, data) VALUES (10001, 'uid00000009', '{"uid00000006": {"value": "option1"}, "uid00000008": {"value": 42}}'::jsonb);
INSERT INTO record (recordid, uid, data) VALUES (10002, 'uid00000010', '{"uid00000006": {"value": "option2"}}'::jsonb);
INSERT INTO record (recordid, uid, data) VALUES (10003, 'uid00000011', '{"uid00000006": {"value": "UNMAPPED"}}'::jsonb);
My query
My drafted query is:
SELECT r.uid AS record_uid,
key AS question_uid,
os.uid AS optionset_uid,
value ->> 'value' AS value
FROM record r, JSONB_EACH(r.data)
JOIN question q ON q.uid = key
JOIN optionset os ON q.optionsetid = os.optionsetid
WHERE q.optionsetid IS NOT NULL
AND value::varchar NOT IN (SELECT DISTINCT code FROM optionvalue WHERE optionsetid = q.optionsetid)
;
Problem
The query above returns all records instead only one. In reference to the sample data, the expected result would be to return only the record where the value is UNMAPPED
(meaning it is the record where an answer was given that is not "valid").
CodePudding user response:
You should change value::varchar NOT IN
to value ->> 'value' NOT IN
SELECT
r.uid AS record_uid,
key AS question_uid,
os.uid AS optionset_uid,
value ->> 'value' AS value
FROM
record r, jsonb_each(r.data)
JOIN question q ON q.uid = key
JOIN optionset os ON q.optionsetid = os.optionsetid
WHERE
q.optionsetid IS NOT NULL
AND value ->> 'value' NOT IN (SELECT DISTINCT code FROM optionvalue WHERE optionsetid = q.optionsetid);