Using SQL, how can I create a new session if the user has been inactive for 5 minutes?
userid timestamp session
1 10:15 1_10:15
1 10:16 1_10:15
1 10:17 1_10:15
1 10:23 1_10:23
2 10:18 2_10:18
2 10:18 2_10:18
2 10:19 2_10:18
2 10:30 2_10:30
I have tried using lead and taking the difference between current and lead, but it is not giving the correct result.
e.g. lead(timestamp,1) over(partition by userid order by timestamp asc) as leadTime,
can someone plz help.
CodePudding user response:
You may use LAG
function to define a flag=1 where current row time is greater than the previous row time by more than 5 minutes, then use a cumulative sum window function (on that flag) to create unique groups for the consecutive rows that have time difference less than 5 minutes.
with get_groups as
(
select userid, timestamp, SUM(flag) over (partition by userid order by timestamp) grp
from
(
select *,
case
when unix_timestamp(timestamp) - unix_timestamp(lag(timestamp) over (partition by userid order by timestamp)) >300
then 1 else 0
end as flag
from table_name
) T
)
select userid, timestamp,
CONCAT(userid, '_', MIN(timestamp) over (partition by userid, grp)) as session
from get_groups
See a demo on MySQL. I'm not sure if this is a valid Hive syntax, but you may consider the idea of this solution.