Home > Mobile >  Postgres aggregate nested jsonb array values
Postgres aggregate nested jsonb array values

Time:05-15

In Postgres 11.x I am trying to aggregate elements in a nested jsonb object which has an array field into a single row per device_id. Here's example data for a table called configurations.

id device_id data
1 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}], ""other_data"": {}}"
2 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}, {""other_data"": {}, ""sensor_type"": 2}], ""other_data"": {}}"
3 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 3}], ""other_data"": {}}"
4 2 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 4}], ""other_data"": {}}"
5 2 "{""sensors"": null, ""other_data"": {}}"
6 3 "{""sensors"": [], ""other_data"": {}}"

My goal output would have a single row per device_id with an array of distinct sensor_types, example:

device_id sensor_types
1 [1,2,3]
2 [4]
3 [ ] null would also be fine here

Tried a bunch of things but running into various problems, here's some SQL to set up a test environment:

CREATE TEMPORARY TABLE configurations(
   id SERIAL PRIMARY KEY,
   device_id SERIAL,
   data JSONB
);

INSERT INTO configurations(device_id, data) VALUES
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} } ] }'),
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} }, { "sensor_type": 2, "other_data": {} }] }'),
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 3, "other_data": {} }] }'),
    (2, '{ "other_data": {}, "sensors": [ { "sensor_type": 4, "other_data": {} }] }'),
    (2, '{ "other_data": {}, "sensors": null }'),
    (3, '{ "other_data": {}, "sensors": [] }');

Quick note, my real table has about 100,000 rows and the jsonb data is much more complicated but follows this general structure.

CodePudding user response:

The JSONB null causes some problems in Postgres and should rather be avoided when possible. You can convert the value to an empty array with the expression

coalesce(nullif(data->'sensors', 'null'), '[]')

The first attempt:

select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
group by device_id;

 device_id | sensor_types
----------- --------------
         1 | {1,2,3}
         2 | {4,NULL}
         3 | {NULL}
(3 rows)

may be unsatisfactory because of nulls in the result. When trying to remove them

select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
where value is not null
group by device_id;

 device_id | sensor_types
----------- --------------
         1 | {1,2,3}
         2 | {4}
(2 rows)

device_id = 3 disappears. Well, we can get all device_ids from the table:

select distinct device_id, sensor_types
from configurations
left join (
    select device_id, array_agg(distinct value->'sensor_type') as sensor_types
    from configurations
    left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
    where value is not null
    group by device_id
    ) s
using(device_id);

 device_id | sensor_types
----------- --------------
         1 | {1,2,3}
         2 | {4}
         3 |
(3 rows)
  • Related