I have a time_entries
table, where I use Postgres' 15 multi-range feature to merge overlapping and adjacent time ranges:
CREATE TABLE time_entries (
id bigint NOT NULL,
contract_id bigint,
"from" timestamp(6) without time zone,
"to" timestamp(6) without time zone,
range tsrange GENERATED ALWAYS AS (tsrange("from", "to")) STORED
);
INSERT INTO time_entries VALUES (1, 1, '2022-12-07T09:00', '2022-12-07T09:45');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T09:45', '2022-12-07T10:00');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T09:55', '2022-12-07T10:15');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T10:20', '2022-12-07T10:30');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T10:45', '2022-12-07T11:00');
SELECT contract_id, unnest(range_agg(range)) AS range FROM time_entries GROUP BY contract_id;
The current result is:
contract_id | range |
---|---|
1 | ["2022-12-07 09:00:00","2022-12-07 10:15:00") |
1 | ["2022-12-07 10:20:00","2022-12-07 10:30:00") |
1 | ["2022-12-07 10:45:00","2022-12-07 11:00:00") |
Now, when two ranges are only up to 5 minutes apart, I'd like to merge them as well. So the desired result would be:
contract_id | range |
---|---|
1 | ["2022-12-07 09:00:00","2022-12-07 10:30:00") |
1 | ["2022-12-07 10:45:00","2022-12-07 11:00:00") |
Working dbfiddle here: https://dbfiddle.uk/owHkVaZ5
Is this achievable with SQL alone? Or do I need some kind of custom Postgres function for this? I've heard of an aggregator function, but never used one.
CodePudding user response:
Richard Huxton's idea of adding 5 minutes before you aggregate is good. Here is a simple implementation:
SELECT contract_id,
tsrange(lower(u.r), upper(u.r) - INTERVAL '5 minutes')
FROM (SELECT contract_id,
range_agg(tsrange("from", "to" INTERVAL '5 minutes')) AS mr
FROM time_entries
GROUP BY contract_id) AS agg
CROSS JOIN LATERAL unnest(agg.mr) AS u(r);
You need the CROSS JOIN
because you want to join each group with all the multirange elements that belong to it. This "belong" is expressed by LATERAL
which means that you refer to elements from the preceding FROM
list entries in a later one. Essentially, that construct and the subquery are needed so that I can get the unnested ranges into the FROM
expression where they belong, so that * can use them in the SELECT
list.
AS u(r)
is a table alias, that is an alias for a table name and the name of its columns at the same time.