I've got the following ordered by user_id, timestamp table:
user_id | timestamp |
---|---|
1000 | 1661919816 |
1000 | 1661919861 |
1001 | 1661919816 |
I need to enumerate this table with session id by the following rules:
- When diff between two timestamps is less than 1 minute, then it is the same session
- When diff is more than 1 minute, then it is previous session_id 1
- When user_id changes, so does session_id (again previous session_id 1)
So in my case I want something like this:
user_id | timestamp | session_id |
---|---|---|
1000 | 1661919816 | 1 |
1000 | 1661919861 | 1 |
1001 | 1661919816 | 2 |
I need something like dense_rank()
, however I don't know how to apply it to diff between two timestamps
CodePudding user response:
You can use the LAG
window function to parse the previous user_id
and timestamp
values, then you can check if the user_id
value has changed, or if it is the same, check if the timestamp
difference is greater than 1 minute (60 seconds). You can then use the SUM
window function to calculate the value of session_id
.
Such a query might look like this
SELECT
user_id,
timestamp,
SUM(flag) OVER (ORDER BY user_id, timestamp) session_id
FROM (
SELECT
*,
CASE WHEN user_id = LAG(user_id) OVER (ORDER BY user_id, timestamp)
AND timestamp - LAG(timestamp, 1, 0) OVER (ORDER BY user_id, timestamp) < 60
THEN 0 -- user_id is not changed and timestamp difference is smaller than 1 minute, do not increment session_id
ELSE 1 -- increment session_id since user_id is changed or timestamp difference is greater than 1 minute
END flag
FROM sessions
) flags
For your sample data, query output
user_id | timestamp | session_id |
---|---|---|
1000 | 1661919816 | 1 |
1000 | 1661919861 | 1 |
1001 | 1661919816 | 2 |
For extended sample data like this
user_id | timestamp |
---|---|
1000 | 1661919816 |
1000 | 1661919861 |
1000 | 1661919871 |
1000 | 1661919931 |
1001 | 1661919816 |
1001 | 1661919896 |
1001 | 1661919906 |
query output
user_id | timestamp | session_id |
---|---|---|
1000 | 1661919816 | 1 |
1000 | 1661919861 | 1 |
1000 | 1661919871 | 1 |
1000 | 1661919931 | 2 |
1001 | 1661919816 | 3 |
1001 | 1661919896 | 4 |
1001 | 1661919906 | 4 |
db<>fiddle here