I want to increase the row number of a partition based on a condition. This question refers to the same problem, but in my case, the column I want to condition on is another window function.
I want to identify the session number of each user (id
) depending on how long ago was their last recorded action (ts
).
My table looks as follows:
id ts
1 2022-08-01 09:00:00 -- user 1, first session
1 2022-08-01 09:10:00
1 2022-08-01 09:12:00
1 2022-08-03 12:00:00 -- user 1, second session
1 2022-08-03 12:03:00
2 2022-08-01 11:04:00 -- user 2, first session
2 2022-08-01 11:07:00
2 2022-08-25 10:30:00 -- user 2, second session
2 2022-08-25 10:35:00
2 2022-08-25 10:36:00
I want to assign each user a session identifier based on the following conditions:
- If the user's last action was 30 or more minutes ago (or doesn't exist), then increase (or initialize) the row number.
- If the user's last action was less than 30 minutes ago, don't increase the row number.
I want to get the following result:
id ts session_id
1 2022-08-01 09:00:00 1
1 2022-08-01 09:10:00 1
1 2022-08-01 09:12:00 1
1 2022-08-03 12:00:00 2
1 2022-08-03 12:03:00 2
2 2022-08-01 11:04:00 1
2 2022-08-01 11:07:00 1
2 2022-08-25 10:30:00 2
2 2022-08-25 10:35:00 2
2 2022-08-25 10:36:00 2
If I had a separate column with the seconds since their last session, I could simply add 1 to each user's partitioned sum. However, this column is a window function itself. Hence, the following query doesn't work:
select
id
,ts
,extract(
epoch from (
ts - lag(ts, 1) over(partition by id order by ts)
)
) as seconds_since -- Number of seconds since last action (works well)
,sum(
case
when coalesce(
extract(
epoch from (
ts - lag(ts, 1) over (partition by id order by ts)
)
), 1800
) >= 1800 then 1
else 0 end
) over (partition by id order by ts) as session_id -- Window inside window (crashes)
from
t
order by
id
,ts
ERROR: Aggregate window functions with an ORDER BY clause require a frame clause
CodePudding user response:
Use LAG()
window function to get the previous ts
of each row and create flag column indicating if the difference between the 2 timestamps is greater than 30 minutes.
Then use SUM()
window function over that flag:
SELECT
id
,ts
,SUM(flag) OVER (
PARTITION BY id
ORDER BY ts
rows unbounded preceding -- necessary in aws-redshift
) as session_id
FROM (
SELECT
*
,COALESCE((LAG(ts) OVER (PARTITION BY id ORDER BY ts) < ts - INTERVAL '30 minute')::int, 1) flag
FROM
tablename
) t
;
See the demo.