I've been trying to solve this problem but failing to get the desired results. Please help as I've been trying for days.
I have the table below where click_time column is the result I need.
I need to find the time difference in unix_time_seconds column between a "click" action and the "fetch" which is before the click and has flag = 2.
The click_time column shows the difference only between the "click" row and the first row before click which has flag = 2 and event=fetch
I used the following statement but I am not sure how to go backwards and find the first row which is "fetch" and has value = "2" and take that row and subtract from the click. I used the lag() function to read previous rows but its getting stuck at getting to the previous row which has flag=2 and event = fetch
select time, unix_time_seconds,event,flag,
case when event = "click"
then unix_time_seconds - lag(unix_time_seconds, 1 ) over (order by flag desc)
end as click_time
from table_event
time | unix_time_seconds | event | flag | click_time |
---|---|---|---|---|
1/2/22 3:52:16 | 1641095536 | fetch | 2 | |
1/2/22 3:52:19 | 1641095539 | click | 0 | 3 |
1/2/22 4:59:13 | 1641099553 | fetch | 2 | |
1/2/22 4:59:21 | 1641099561 | fetch | 1 | |
1/2/22 4:59:28 | 1641099568 | fetch | 1 | |
1/2/22 4:59:35 | 1641099575 | fetch | 1 | |
1/2/22 6:51:42 | 1641106302 | fetch | 2 | |
1/2/22 6:51:57 | 1641106317 | fetch | 1 | |
1/2/22 6:51:59 | 1641106319 | click | 0 | 17 |
1/3/22 6:15:20 | 1641190520 | fetch | 2 | |
1/7/22 8:12:15 | 1641543135 | fetch | 2 | |
1/10/22 1:09:56 | 1641776996 | fetch | 2 | |
1/10/22 1:09:57 | 1641776997 | click | 0 | 1 |
1/10/22 1:12:59 | 1641777179 | fetch | 2 | |
1/10/22 1:13:01 | 1641777181 | click | 0 | 2 |
CodePudding user response:
Just adding partition by (case when event='click' or flag=2 then 1 end)
in your window function will server the purpose. This condition will ignore all the rows where the event is not 'click' or event is 'fetch' and flag is not 2.
Instead of all fifteen rows lag()
window function will only consider below 11 rows.
time | unix_time_seconds | event | flag |
---|---|---|---|
2022-02-01 03:52:16 | 1641095536 | fetch | 2 |
2022-02-01 03:52:19 | 1641095539 | click | 0 |
2022-02-01 04:59:13 | 1641099553 | fetch | 2 |
2022-02-01 06:51:42 | 1641106302 | fetch | 2 |
2022-02-01 06:51:59 | 1641106319 | click | 0 |
2022-03-01 06:15:20 | 1641190520 | fetch | 2 |
2022-07-01 08:12:15 | 1641543135 | fetch | 2 |
2022-10-01 01:09:56 | 1641776996 | fetch | 2 |
2022-10-01 01:09:57 | 1641776997 | click | 0 |
2022-10-01 01:12:59 | 1641777179 | fetch | 2 |
0122-10-01 01:13:01 | 1641777181 | click | 0 |
Schema and insert statements:
create table table_event(time datetime, unix_time_seconds int, event varchar(10), flag int);
insert into table_event values('22/2/1 3:52:16', 1641095536, 'fetch', 2);
insert into table_event values('22/2/1 3:52:19', 1641095539, 'click', 0);
insert into table_event values('22/2/1 4:59:13', 1641099553, 'fetch', 2);
insert into table_event values('22/2/1 4:59:21', 1641099561, 'fetch', 1);
insert into table_event values('22/2/1 4:59:28', 1641099568, 'fetch', 1);
insert into table_event values('22/2/1 4:59:35', 1641099575, 'fetch', 1);
insert into table_event values('22/2/1 6:51:42', 1641106302, 'fetch', 2);
insert into table_event values('22/2/1 6:51:57', 1641106317, 'fetch', 1);
insert into table_event values('22/2/1 6:51:59', 1641106319, 'click', 0);
insert into table_event values('22/3/1 6:15:20', 1641190520, 'fetch', 2);
insert into table_event values('22/7/1 8:12:15', 1641543135, 'fetch', 2);
insert into table_event values('22/10/1 1:09:56', 1641776996, 'fetch', 2);
insert into table_event values('22/10/1 1:09:57', 1641776997, 'click', 0);
insert into table_event values('22/10/1 1:12:59', 1641777179, 'fetch', 2);
insert into table_event values('122/10/1 1:13:01', 1641777181, 'click', 0);
Query:
select time, unix_time_seconds,event,flag, case when event = 'click' then unix_time_seconds - lag(unix_time_seconds, 1 ) over (partition by (case when event='click' or(event='fetch' and flag=2) then 1 end) order by time ) end as click_time
from table_event order by unix_time_seconds
Output:
time | unix_time_seconds | event | flag | click_time |
---|---|---|---|---|
2022-02-01 03:52:16 | 1641095536 | fetch | 2 | null |
2022-02-01 03:52:19 | 1641095539 | click | 0 | 3 |
2022-02-01 04:59:13 | 1641099553 | fetch | 2 | null |
2022-02-01 04:59:21 | 1641099561 | fetch | 1 | null |
2022-02-01 04:59:28 | 1641099568 | fetch | 1 | null |
2022-02-01 04:59:35 | 1641099575 | fetch | 1 | null |
2022-02-01 06:51:42 | 1641106302 | fetch | 2 | null |
2022-02-01 06:51:57 | 1641106317 | fetch | 1 | null |
2022-02-01 06:51:59 | 1641106319 | click | 0 | 17 |
2022-03-01 06:15:20 | 1641190520 | fetch | 2 | null |
2022-07-01 08:12:15 | 1641543135 | fetch | 2 | null |
2022-10-01 01:09:56 | 1641776996 | fetch | 2 | null |
2022-10-01 01:09:57 | 1641776997 | click | 0 | 1 |
2022-10-01 01:12:59 | 1641777179 | fetch | 2 | null |
22-10-01 01:13:01 | 1641777181 | click | 0 | 2 |
db<>fiddle here
CodePudding user response:
There are multiple solutions to this question, here is one of them.
Note the use of values for the creation of an ad hoc dataset.
Another way for doing that, is by using the stack function
Also note the use of timestamp literal, e.g. timestamp '2022-01-02 03:52:16'
The outer CASE statement is in order to display click_time values only for click events.
The window function is ordering the records by the unix_time_seconds and for each record takes the max unix_time_seconds up until this record (order by
in this context is actually an implied syntax for order by ... rows between unbounded preceding and current row
).
The CASE statement within the window function makse sure that we're looking only on fetch events with 2 flag.
with t (time, unix_time_seconds, event, flag)
as
(
select *
from values (timestamp '2022-01-02 03:52:16', 1641095536, 'fetch', 2)
,(timestamp '2022-01-02 03:52:19', 1641095539, 'click', 0)
,(timestamp '2022-01-02 04:59:13', 1641099553, 'fetch', 2)
,(timestamp '2022-01-02 04:59:21', 1641099561, 'fetch', 1)
,(timestamp '2022-01-02 04:59:28', 1641099568, 'fetch', 1)
,(timestamp '2022-01-02 04:59:35', 1641099575, 'fetch', 1)
,(timestamp '2022-01-02 06:51:42', 1641106302, 'fetch', 2)
,(timestamp '2022-01-02 06:51:57', 1641106317, 'fetch', 1)
,(timestamp '2022-01-02 06:51:59', 1641106319, 'click', 0)
,(timestamp '2022-01-03 06:15:20', 1641190520, 'fetch', 2)
,(timestamp '2022-01-07 08:12:15', 1641543135, 'fetch', 2)
,(timestamp '2022-01-10 01:09:56', 1641776996, 'fetch', 2)
,(timestamp '2022-01-10 01:09:57', 1641776997, 'click', 0)
,(timestamp '2022-01-10 01:12:59', 1641777179, 'fetch', 2)
,(timestamp '2022-01-10 01:13:01', 1641777181, 'click', 0)
)
select *
,case
when event == 'click'
then unix_time_seconds
- max(case
when event = 'fetch' and flag = 2
then unix_time_seconds
end
) over (order by unix_time_seconds)
end as click_time
from t
time | unix_time_seconds | event | flag | click_time |
---|---|---|---|---|
2022-01-02T03:52:16.000 0000 | 1641095536 | fetch | 2 | null |
2022-01-02T03:52:19.000 0000 | 1641095539 | click | 0 | 3 |
2022-01-02T04:59:13.000 0000 | 1641099553 | fetch | 2 | null |
2022-01-02T04:59:21.000 0000 | 1641099561 | fetch | 1 | null |
2022-01-02T04:59:28.000 0000 | 1641099568 | fetch | 1 | null |
2022-01-02T04:59:35.000 0000 | 1641099575 | fetch | 1 | null |
2022-01-02T06:51:42.000 0000 | 1641106302 | fetch | 2 | null |
2022-01-02T06:51:57.000 0000 | 1641106317 | fetch | 1 | null |
2022-01-02T06:51:59.000 0000 | 1641106319 | click | 0 | 17 |
2022-01-03T06:15:20.000 0000 | 1641190520 | fetch | 2 | null |
2022-01-07T08:12:15.000 0000 | 1641543135 | fetch | 2 | null |
2022-01-10T01:09:56.000 0000 | 1641776996 | fetch | 2 | null |
2022-01-10T01:09:57.000 0000 | 1641776997 | click | 0 | 1 |
2022-01-10T01:12:59.000 0000 | 1641777179 | fetch | 2 | null |
2022-01-10T01:13:01.000 0000 | 1641777181 | click | 0 | 2 |
This solution was tested on Azure Databricks, RT 10.1 with Apache Spark 3.2.0
CodePudding user response:
This should do the trick:
select time, unix_time_seconds,event,flag,
case when event = 'click' then
unix_time_seconds - (select top 1 unix_time_seconds from table_event y where event = 'fetch' and flag = 2 AND y.unix_time_seconds <= z.unix_time_seconds order by y.unix_time_seconds desc)
end as click_time from table_event z order by unix_time_seconds