I'm trying to group data into sessions that are separated by inactivity, or gaps in the timestamp column, in postgres. (For reference, this is referred to as a 'session window' in the kSQL kafka streams world.)
For example, say I have a series of measurements coming in from an IoT device. Each row has a timestamp and a value, but the device does not always send data, only when needed, so there are gaps of inactivity.
timestamp | value
2021-10-29 11:05:00 | 100
2021-10-29 11:04:00 | 101
2021-10-29 09:05:00 | 99
2021-10-29 09:03:00 | 100
Now I want to query this data to group the rows that occur within a 10 minute session window (separated by no rows for a certain amount of time).
The query would look like:
select
max(timestamp) as "end",
min(timestamp) as "start",
avg(value) as "average"
from table
group by
session_window(timestamp, 'interval 10 minutes') -- <- how to do this part?
And the output should look like:
end | start | average
2021-10-29 11:05:00 | 2021-10-29 11:04:00 | 100.5
2021-10-29 09:05:00 | 2021-10-29 09:03:00 | 99.5
CodePudding user response:
If you want to create a 10 minutes interval based on the min and max timestamps of a given record set, you might first establish these windows within a subquery or CTE using a tsrange
, and finally in the outer query aggregate the records overlap with these new windows, e.g.
CREATE TABLE iot (tmst timestamp without time zone, value numeric);
INSERT INTO iot VALUES
('2021-10-29 11:05:00',100),
('2021-10-29 11:04:00',101),
('2021-10-29 09:05:00',99),
('2021-10-29 09:03:00',100);
WITH j AS (
SELECT tsrange(series,series '00:09:59'::time) AS ts_win
FROM (SELECT generate_series(min(tmst),max(tmst),interval '10 minutes')
FROM iot) i (series)
)
SELECT
lower(j.ts_win) AS start, upper(j.ts_win) AS end, avg(iot.value) AS average
FROM j
JOIN iot ON j.ts_win @> iot.tmst
GROUP BY j.ts_win;
start | end | average
--------------------- --------------------- ----------------------
2021-10-29 09:03:00 | 2021-10-29 09:12:59 | 99.5000000000000000
2021-10-29 11:03:00 | 2021-10-29 11:12:59 | 100.5000000000000000
(2 Zeilen)
- Demo:
db<>fiddle
- Perhaps interesting:
How can I group data by batches of 5 seconds with postgres?
CodePudding user response:
We first need to place the event timestamp into 10 minutes long time slots (tp
) then group by tp
and aggregate. In Postgresql 14 you can use date_bin
function to do this. Prior to version 14 use function date_bin below which is an analog of PG14 function with the same return value and arguments' signature.
create or replace function date_bin
(trunc_period interval, ts timestamptz, base_ts timestamptz default '1970-01-01Z')
returns timestamptz language sql immutable as
$$
select
base_ts
floor(extract(epoch from ts - base_ts) / extract(epoch from trunc_period))::bigint
* trunc_period;
$$;
CREATE temporary TABLE t (ts timestamp, val int);
INSERT INTO t (ts, val)
VALUES
('2021-10-29 11:05:00', 100),
('2021-10-29 11:04:00', 101),
('2021-10-29 09:05:00', 99),
('2021-10-29 09:03:00', 100);
with cte as
(
select date_bin(interval '10 minutes', ts) tp, val
from t
)
select tp "start", tp interval '10 minutes' "end", avg(val) average
from cte
group by tp;
Or as the query in your example goes:
select
max(ts) as "end", min(ts) as "start", avg(val) as "average"
from t
group by date_bin(interval '10 minutes', ts);