In Postgres 11.x I am trying to aggregate elements in a nested jsonb object which has an array field into a single row per device_id. Here's example data for a table called configurations
.
id | device_id | data |
---|---|---|
1 | 1 | "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}], ""other_data"": {}}" |
2 | 1 | "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}, {""other_data"": {}, ""sensor_type"": 2}], ""other_data"": {}}" |
3 | 1 | "{""sensors"": [{""other_data"": {}, ""sensor_type"": 3}], ""other_data"": {}}" |
4 | 2 | "{""sensors"": [{""other_data"": {}, ""sensor_type"": 4}], ""other_data"": {}}" |
5 | 2 | "{""sensors"": null, ""other_data"": {}}" |
6 | 3 | "{""sensors"": [], ""other_data"": {}}" |
My goal output would have a single row per device_id with an array of distinct sensor_types, example:
device_id | sensor_types |
---|---|
1 | [1,2,3] |
2 | [4] |
3 | [ ] null would also be fine here |
Tried a bunch of things but running into various problems, here's some SQL to set up a test environment:
CREATE TEMPORARY TABLE configurations(
id SERIAL PRIMARY KEY,
device_id SERIAL,
data JSONB
);
INSERT INTO configurations(device_id, data) VALUES
(1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} } ] }'),
(1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} }, { "sensor_type": 2, "other_data": {} }] }'),
(1, '{ "other_data": {}, "sensors": [ { "sensor_type": 3, "other_data": {} }] }'),
(2, '{ "other_data": {}, "sensors": [ { "sensor_type": 4, "other_data": {} }] }'),
(2, '{ "other_data": {}, "sensors": null }'),
(3, '{ "other_data": {}, "sensors": [] }');
Quick note, my real table has about 100,000 rows and the jsonb data is much more complicated but follows this general structure.
CodePudding user response:
The JSONB null
causes some problems in Postgres and should rather be avoided when possible. You can convert the value to an empty array with the expression
coalesce(nullif(data->'sensors', 'null'), '[]')
The first attempt:
select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
group by device_id;
device_id | sensor_types
----------- --------------
1 | {1,2,3}
2 | {4,NULL}
3 | {NULL}
(3 rows)
may be unsatisfactory because of nulls
in the result. When trying to remove them
select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
where value is not null
group by device_id;
device_id | sensor_types
----------- --------------
1 | {1,2,3}
2 | {4}
(2 rows)
device_id = 3
disappears. Well, we can get all device_ids
from the table:
select distinct device_id, sensor_types
from configurations
left join (
select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
where value is not null
group by device_id
) s
using(device_id);
device_id | sensor_types
----------- --------------
1 | {1,2,3}
2 | {4}
3 |
(3 rows)