I have a table called activity
that contains values like the following:
userId | timestamp | action
----------------------------------------
1 | 2022-10-18 10:00:00 | OPEN
2 | 2022-10-18 10:20:00 | OPEN
1 | 2022-10-18 10:05:00 | CLOSE
2 | 2022-10-18 10:22:00 | CLOSE
...
I want to be able to get the difference between the open and close time per user for a given day like so:
desiredTable
------------------------------------------------------
userId | start_time | time_elapsed_in_minutes
------------------------------------------------------
1 | 2022-10-18 10:00:00 | 5
2 | 2022-10-18 10:20:00 | 2
A couple of things to note are:
- There is no guarantee
OPEN
andCLOSE
rows will be back to back to each other as the table also holds a lot of other action types. - Also, there is no guarantee that there will be the same number of
OPEN
andCLOSE
rows due to network conditions that could result in either not being reported. I.e: user 1 can have 3 opens and 1 close, so only 1 pair needs to be calculated.
My approach:
- create table of OPEN and CLOSE counts per user
userId | # opens | # closes
- determine which number is less (# opens or # closes) and get that # of rows from the activity table
- opens table with the # of rows determined above
userId | timestamp
- closes table with the # of rows determined above
userId | timestamp
- subtract close timestamp from open timestamp group by userId and truncate to minutes
Any help would be greatly appreciated!
CodePudding user response:
We group every action = 'close'
with the action = 'open'
that comes before it. Then we group by id and choose the timestamps.
select userid
,min(timestamp) as start_time
,max(timestamp)-min(timestamp) as time_elapsed_in_minutes
from (
select *
,count(case action when 'OPEN' then 1 end) over(partition by userid order by timestamp) as grp
from t
where action in('OPEN', 'CLOSE')
) t
group by userid, grp
userid | start_time | time_elapsed_in_minutes |
---|---|---|
1 | 2022-10-18 10:00:00 | 00:05:00 |
2 | 2022-10-18 10:20:00 | 00:02:00 |
CodePudding user response:
We can use lead()
when next action
is closed
.
select *
from (
select userid
,timestamp as start_time
,case lead(action) over(partition by userid order by timestamp) when 'CLOSE' then lead(timestamp) over(partition by userid order by timestamp)-timestamp end as time_elapsed_in_minutes
from t
where action in('OPEN', 'CLOSE')
) t
where time_elapsed_in_minutes is not null
userid | start_time | time_elapsed_in_minutes |
---|---|---|
1 | 2022-10-18 10:00:00 | 00:05:00 |
2 | 2022-10-18 10:20:00 | 00:02:00 |