Home > Software engineering >  How to retrieve rows using SQL Lag() function with condition
How to retrieve rows using SQL Lag() function with condition

Time:03-03

I've been trying to solve this problem but failing to get the desired results. Please help as I've been trying for days.

I have the table below where click_time column is the result I need.

I need to find the time difference in unix_time_seconds column between a "click" action and the "fetch" which is before the click and has flag = 2.

The click_time column shows the difference only between the "click" row and the first row before click which has flag = 2 and event=fetch

I used the following statement but I am not sure how to go backwards and find the first row which is "fetch" and has value = "2" and take that row and subtract from the click. I used the lag() function to read previous rows but its getting stuck at getting to the previous row which has flag=2 and event = fetch

 select time, unix_time_seconds,event,flag,
 case when event = "click"
   then unix_time_seconds - lag(unix_time_seconds, 1 ) over (order by flag desc)
 end as click_time

 from table_event
time unix_time_seconds event flag click_time
1/2/22 3:52:16 1641095536 fetch 2
1/2/22 3:52:19 1641095539 click 0 3
1/2/22 4:59:13 1641099553 fetch 2
1/2/22 4:59:21 1641099561 fetch 1
1/2/22 4:59:28 1641099568 fetch 1
1/2/22 4:59:35 1641099575 fetch 1
1/2/22 6:51:42 1641106302 fetch 2
1/2/22 6:51:57 1641106317 fetch 1
1/2/22 6:51:59 1641106319 click 0 17
1/3/22 6:15:20 1641190520 fetch 2
1/7/22 8:12:15 1641543135 fetch 2
1/10/22 1:09:56 1641776996 fetch 2
1/10/22 1:09:57 1641776997 click 0 1
1/10/22 1:12:59 1641777179 fetch 2
1/10/22 1:13:01 1641777181 click 0 2

CodePudding user response:

Just adding partition by (case when event='click' or flag=2 then 1 end) in your window function will server the purpose. This condition will ignore all the rows where the event is not 'click' or event is 'fetch' and flag is not 2.

Instead of all fifteen rows lag() window function will only consider below 11 rows.

time unix_time_seconds event flag
2022-02-01 03:52:16 1641095536 fetch 2
2022-02-01 03:52:19 1641095539 click 0
2022-02-01 04:59:13 1641099553 fetch 2
2022-02-01 06:51:42 1641106302 fetch 2
2022-02-01 06:51:59 1641106319 click 0
2022-03-01 06:15:20 1641190520 fetch 2
2022-07-01 08:12:15 1641543135 fetch 2
2022-10-01 01:09:56 1641776996 fetch 2
2022-10-01 01:09:57 1641776997 click 0
2022-10-01 01:12:59 1641777179 fetch 2
0122-10-01 01:13:01 1641777181 click 0

Schema and insert statements:

 create table table_event(time datetime,    unix_time_seconds int,  event varchar(10), flag int);
 insert into table_event values('22/2/1 3:52:16',   1641095536, 'fetch',    2);
 insert into table_event values('22/2/1 3:52:19',   1641095539, 'click',    0);
 insert into table_event values('22/2/1 4:59:13',   1641099553, 'fetch',    2);
 insert into table_event values('22/2/1 4:59:21',   1641099561, 'fetch',    1);
 insert into table_event values('22/2/1 4:59:28',   1641099568, 'fetch',    1);     
 insert into table_event values('22/2/1 4:59:35',   1641099575, 'fetch',    1);
 insert into table_event values('22/2/1 6:51:42',   1641106302, 'fetch',    2);
 insert into table_event values('22/2/1 6:51:57',   1641106317, 'fetch',    1);
 insert into table_event values('22/2/1 6:51:59',   1641106319, 'click',    0);
 insert into table_event values('22/3/1 6:15:20',   1641190520, 'fetch',    2);
 insert into table_event values('22/7/1 8:12:15',   1641543135, 'fetch',    2); 
 insert into table_event values('22/10/1 1:09:56',  1641776996, 'fetch',    2); 
 insert into table_event values('22/10/1 1:09:57',  1641776997, 'click',    0);
 insert into table_event values('22/10/1 1:12:59',  1641777179, 'fetch',    2); 
 insert into table_event values('122/10/1 1:13:01', 1641777181, 'click',    0);

Query:

select time, unix_time_seconds,event,flag, case when event = 'click' then unix_time_seconds - lag(unix_time_seconds, 1 ) over (partition by (case when event='click' or(event='fetch' and flag=2) then 1 end) order by time ) end as click_time

from table_event order by unix_time_seconds

Output:

time unix_time_seconds event flag click_time
2022-02-01 03:52:16 1641095536 fetch 2 null
2022-02-01 03:52:19 1641095539 click 0 3
2022-02-01 04:59:13 1641099553 fetch 2 null
2022-02-01 04:59:21 1641099561 fetch 1 null
2022-02-01 04:59:28 1641099568 fetch 1 null
2022-02-01 04:59:35 1641099575 fetch 1 null
2022-02-01 06:51:42 1641106302 fetch 2 null
2022-02-01 06:51:57 1641106317 fetch 1 null
2022-02-01 06:51:59 1641106319 click 0 17
2022-03-01 06:15:20 1641190520 fetch 2 null
2022-07-01 08:12:15 1641543135 fetch 2 null
2022-10-01 01:09:56 1641776996 fetch 2 null
2022-10-01 01:09:57 1641776997 click 0 1
2022-10-01 01:12:59 1641777179 fetch 2 null
22-10-01 01:13:01 1641777181 click 0 2

db<>fiddle here

CodePudding user response:

There are multiple solutions to this question, here is one of them.

Note the use of values for the creation of an ad hoc dataset.
Another way for doing that, is by using the stack function Also note the use of timestamp literal, e.g. timestamp '2022-01-02 03:52:16'

The outer CASE statement is in order to display click_time values only for click events.

The window function is ordering the records by the unix_time_seconds and for each record takes the max unix_time_seconds up until this record (order by in this context is actually an implied syntax for order by ... rows between unbounded preceding and current row).
The CASE statement within the window function makse sure that we're looking only on fetch events with 2 flag.

with t (time, unix_time_seconds, event, flag)
as
(
  select   *
  from     values  (timestamp '2022-01-02 03:52:16', 1641095536, 'fetch', 2)
                  ,(timestamp '2022-01-02 03:52:19', 1641095539, 'click', 0)
                  ,(timestamp '2022-01-02 04:59:13', 1641099553, 'fetch', 2)
                  ,(timestamp '2022-01-02 04:59:21', 1641099561, 'fetch', 1)
                  ,(timestamp '2022-01-02 04:59:28', 1641099568, 'fetch', 1)
                  ,(timestamp '2022-01-02 04:59:35', 1641099575, 'fetch', 1)
                  ,(timestamp '2022-01-02 06:51:42', 1641106302, 'fetch', 2)
                  ,(timestamp '2022-01-02 06:51:57', 1641106317, 'fetch', 1)
                  ,(timestamp '2022-01-02 06:51:59', 1641106319, 'click', 0)
                  ,(timestamp '2022-01-03 06:15:20', 1641190520, 'fetch', 2)
                  ,(timestamp '2022-01-07 08:12:15', 1641543135, 'fetch', 2)
                  ,(timestamp '2022-01-10 01:09:56', 1641776996, 'fetch', 2)
                  ,(timestamp '2022-01-10 01:09:57', 1641776997, 'click', 0)
                  ,(timestamp '2022-01-10 01:12:59', 1641777179, 'fetch', 2)
                  ,(timestamp '2022-01-10 01:13:01', 1641777181, 'click', 0)
)
select    *
         ,case 
         
             when  event == 'click' 
             
             then  unix_time_seconds 
             
                 - max(case 
                           when event = 'fetch' and flag = 2 
                           then unix_time_seconds 
                       end
                       ) over (order by unix_time_seconds)
          end as click_time
          
from      t
time unix_time_seconds event flag click_time
2022-01-02T03:52:16.000 0000 1641095536 fetch 2 null
2022-01-02T03:52:19.000 0000 1641095539 click 0 3
2022-01-02T04:59:13.000 0000 1641099553 fetch 2 null
2022-01-02T04:59:21.000 0000 1641099561 fetch 1 null
2022-01-02T04:59:28.000 0000 1641099568 fetch 1 null
2022-01-02T04:59:35.000 0000 1641099575 fetch 1 null
2022-01-02T06:51:42.000 0000 1641106302 fetch 2 null
2022-01-02T06:51:57.000 0000 1641106317 fetch 1 null
2022-01-02T06:51:59.000 0000 1641106319 click 0 17
2022-01-03T06:15:20.000 0000 1641190520 fetch 2 null
2022-01-07T08:12:15.000 0000 1641543135 fetch 2 null
2022-01-10T01:09:56.000 0000 1641776996 fetch 2 null
2022-01-10T01:09:57.000 0000 1641776997 click 0 1
2022-01-10T01:12:59.000 0000 1641777179 fetch 2 null
2022-01-10T01:13:01.000 0000 1641777181 click 0 2

This solution was tested on Azure Databricks, RT 10.1 with Apache Spark 3.2.0

CodePudding user response:

This should do the trick:

select time, unix_time_seconds,event,flag,
case when event = 'click' then 
    unix_time_seconds - (select top 1 unix_time_seconds from table_event y where event = 'fetch' and flag = 2 AND y.unix_time_seconds <= z.unix_time_seconds order by y.unix_time_seconds desc) 
end as click_time  from table_event z order by unix_time_seconds
  • Related