Home > OS >  SQL Timeseries - Group rows by gap in inactivity (session window)
SQL Timeseries - Group rows by gap in inactivity (session window)

Time:10-30

I'm trying to group data into sessions that are separated by inactivity, or gaps in the timestamp column, in postgres. (For reference, this is referred to as a 'session window' in the kSQL kafka streams world.)

For example, say I have a series of measurements coming in from an IoT device. Each row has a timestamp and a value, but the device does not always send data, only when needed, so there are gaps of inactivity.

    timestamp         |  value

2021-10-29 11:05:00   |  100  
2021-10-29 11:04:00   |  101  
2021-10-29 09:05:00   |  99  
2021-10-29 09:03:00   |  100  

Now I want to query this data to group the rows that occur within a 10 minute session window (separated by no rows for a certain amount of time).

The query would look like:

select  
  max(timestamp) as "end",  
  min(timestamp) as "start",  
  avg(value) as "average"  
from table  
group by  
  session_window(timestamp, 'interval 10 minutes') -- <- how to do this part?

And the output should look like:

    end               |      start            |   average

2021-10-29 11:05:00   |  2021-10-29 11:04:00  |   100.5
2021-10-29 09:05:00   |  2021-10-29 09:03:00  |   99.5

CodePudding user response:

If you want to create a 10 minutes interval based on the min and max timestamps of a given record set, you might first establish these windows within a subquery or CTE using a tsrange, and finally in the outer query aggregate the records overlap with these new windows, e.g.

CREATE TABLE iot (tmst timestamp without time zone, value numeric);

INSERT INTO iot VALUES 
('2021-10-29 11:05:00',100),
('2021-10-29 11:04:00',101),
('2021-10-29 09:05:00',99),  
('2021-10-29 09:03:00',100);

WITH j AS (
  SELECT tsrange(series,series '00:09:59'::time) AS ts_win 
  FROM (SELECT generate_series(min(tmst),max(tmst),interval '10 minutes') 
        FROM iot) i (series)  
) 
SELECT 
 lower(j.ts_win) AS start, upper(j.ts_win) AS end, avg(iot.value) AS average
FROM j
JOIN iot ON j.ts_win @> iot.tmst
GROUP BY j.ts_win;

        start        |         end         |       average        
--------------------- --------------------- ----------------------
 2021-10-29 09:03:00 | 2021-10-29 09:12:59 |  99.5000000000000000
 2021-10-29 11:03:00 | 2021-10-29 11:12:59 | 100.5000000000000000
(2 Zeilen)

CodePudding user response:

We first need to place the event timestamp into 10 minutes long time slots (tp) then group by tp and aggregate. In Postgresql 14 you can use date_bin function to do this. Prior to version 14 use function date_bin below which is an analog of PG14 function with the same return value and arguments' signature.

create or replace function date_bin
 (trunc_period interval, ts timestamptz, base_ts timestamptz default '1970-01-01Z')
 returns timestamptz language sql immutable as
$$
select
  base_ts
    floor(extract(epoch from ts - base_ts) / extract(epoch from trunc_period))::bigint
  * trunc_period;
$$;

CREATE temporary TABLE t (ts timestamp, val int);
INSERT INTO t (ts, val)
VALUES
    ('2021-10-29 11:05:00', 100),
    ('2021-10-29 11:04:00', 101),
    ('2021-10-29 09:05:00', 99),
    ('2021-10-29 09:03:00', 100);

with cte as 
(
    select date_bin(interval '10 minutes', ts) tp, val 
    from t
)
select tp "start", tp   interval '10 minutes' "end", avg(val) average
from cte
group by tp;

Or as the query in your example goes:

select  
  max(ts) as "end", min(ts) as "start", avg(val) as "average"  
from t  
group by date_bin(interval '10 minutes', ts);
 
  • Related