Home > Back-end >  Query JSONB column using joins to filter by subquery referencing outer query
Query JSONB column using joins to filter by subquery referencing outer query

Time:10-21

I need to analyze survey data (stored in records) where a question can have a choice of options. My goal is to identify the answers given that were NOT within the range of allowed options for this question. However, my query returns everything (I suspect a subquery) and I don't know how to fix it.

Schema

The records stores its data in the data JSONB column. There, the keys are question UIDs, e.g. uid00000006 has the answer option1. option1 is a choice to select. (Not all questions need to have a dropdown, so some other value is fine such as 42.)

{"uid00000006": {"value": "option1"}, "uid00000008": {"value": 42}}

A question optionally has a reference to a optionset (the dropdown) which has a range of optionvalues (the values of the dropdown) , e.g. option1, option2, option3 etc.

create table record
(
    recordid bigint not null primary key,
    uid varchar(11) unique,
    data jsonb default '{}'::jsonb not null
);


create table question
(
    questionid bigint not null primary key,
    uid varchar(11) not null unique,
    optionsetid bigint 
);

create table optionset
(
    optionsetid bigint not null primary key,
    uid varchar(11) not null unique
);

create table optionvalue
(
    optionvalueid bigint not null primary key,
    uid varchar(11) not null unique,
    code varchar(230) not null,
    optionsetid bigint
);



-- create optionset
INSERT INTO optionset (optionsetid, uid) VALUES (1, 'uid00000001');

-- insert optionvalues into optionset
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (100, 'uid00000002', 'option1', 1);
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (101, 'uid00000003', 'option2', 1);
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (102, 'uid00000004', 'option3', 1);
INSERT INTO optionvalue (optionvalueid, uid, code, optionsetid) VALUES (103, 'uid00000005', 'option4', 1);

-- insert questions
INSERT INTO question (questionid, uid, optionsetid) VALUES (1001, 'uid00000006', 1);
INSERT INTO question (questionid, uid, optionsetid) VALUES (1002, 'uid00000007', 1);
INSERT INTO question (questionid, uid, optionsetid) VALUES (1003, 'uid00000008', NULL);

-- insert records
INSERT INTO record (recordid, uid, data) VALUES (10001, 'uid00000009', '{"uid00000006": {"value": "option1"}, "uid00000008": {"value": 42}}'::jsonb);
INSERT INTO record (recordid, uid, data) VALUES (10002, 'uid00000010', '{"uid00000006": {"value": "option2"}}'::jsonb);
INSERT INTO record (recordid, uid, data) VALUES (10003, 'uid00000011', '{"uid00000006": {"value": "UNMAPPED"}}'::jsonb);

My query

My drafted query is:

SELECT r.uid             AS record_uid,
       key               AS question_uid,
       os.uid            AS optionset_uid,
       value ->> 'value' AS value
FROM record r, JSONB_EACH(r.data)
JOIN question q ON q.uid = key
JOIN optionset os ON q.optionsetid = os.optionsetid
WHERE q.optionsetid IS NOT NULL
AND value::varchar NOT IN (SELECT DISTINCT code FROM optionvalue WHERE optionsetid = q.optionsetid)
;

DBFiddle

Problem

The query above returns all records instead only one. In reference to the sample data, the expected result would be to return only the record where the value is UNMAPPED (meaning it is the record where an answer was given that is not "valid").

CodePudding user response:

You should change value::varchar NOT IN to value ->> 'value' NOT IN

SELECT
  r.uid             AS record_uid,
  key               AS question_uid,
  os.uid            AS optionset_uid,
  value ->> 'value' AS value
FROM
  record r, jsonb_each(r.data)
  JOIN question q ON q.uid = key
  JOIN optionset os ON q.optionsetid = os.optionsetid
WHERE 
 q.optionsetid IS NOT NULL
 AND value ->> 'value' NOT IN (SELECT DISTINCT code FROM optionvalue WHERE optionsetid = q.optionsetid);
  • Related