Merge Postgres ranges that are not more than 5 minutes apart-CodePudding

I have a time_entries table, where I use Postgres' 15 multi-range feature to merge overlapping and adjacent time ranges:

CREATE TABLE time_entries (
    id bigint NOT NULL,
    contract_id bigint,
    "from" timestamp(6) without time zone,
    "to" timestamp(6) without time zone,
    range tsrange GENERATED ALWAYS AS (tsrange("from", "to")) STORED
);

INSERT INTO time_entries VALUES (1, 1, '2022-12-07T09:00', '2022-12-07T09:45');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T09:45', '2022-12-07T10:00');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T09:55', '2022-12-07T10:15');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T10:20', '2022-12-07T10:30');
INSERT INTO time_entries VALUES (2, 1, '2022-12-07T10:45', '2022-12-07T11:00');

SELECT contract_id, unnest(range_agg(range)) AS range FROM time_entries GROUP BY contract_id;

The current result is:

contract_id	range
1	["2022-12-07 09:00:00","2022-12-07 10:15:00")
1	["2022-12-07 10:20:00","2022-12-07 10:30:00")
1	["2022-12-07 10:45:00","2022-12-07 11:00:00")

Now, when two ranges are only up to 5 minutes apart, I'd like to merge them as well. So the desired result would be:

contract_id	range
1	["2022-12-07 09:00:00","2022-12-07 10:30:00")
1	["2022-12-07 10:45:00","2022-12-07 11:00:00")

Working dbfiddle here: https://dbfiddle.uk/owHkVaZ5

Is this achievable with SQL alone? Or do I need some kind of custom Postgres function for this? I've heard of an aggregator function, but never used one.

CodePudding user response：

Richard Huxton's idea of adding 5 minutes before you aggregate is good. Here is a simple implementation:

SELECT contract_id,
       tsrange(lower(u.r), upper(u.r) - INTERVAL '5 minutes')
FROM (SELECT contract_id,
             range_agg(tsrange("from", "to"   INTERVAL '5 minutes')) AS mr
      FROM time_entries
      GROUP BY contract_id) AS agg
   CROSS JOIN LATERAL unnest(agg.mr) AS u(r);

You need the CROSS JOIN because you want to join each group with all the multirange elements that belong to it. This "belong" is expressed by LATERAL which means that you refer to elements from the preceding FROM list entries in a later one. Essentially, that construct and the subquery are needed so that I can get the unnested ranges into the FROM expression where they belong, so that * can use them in the SELECT list.

AS u(r) is a table alias, that is an alias for a table name and the name of its columns at the same time.