Home > other >  Find customers with at least 5 transactions in At most 3 consecutive days
Find customers with at least 5 transactions in At most 3 consecutive days

Time:03-13

I have a table in SQL Server that contains customers' transactions From 2022-02-10 to 2022-03-10.

I want to find customers that have at least 5 transactions on At most three consecutive days

For example, output of below table should be CustomerId = 2

Id CustomerId Transactiondate
1 1 2022-03-01
2 1 2022_03_01
3 1 2022_03_05
4 1 2022_03_07
5 1 2022_03_07
6 2 2022_03_05
7 2 2022_03_05
8 2 2022_03_06
9 2 2022_03_06
10 2 2022_03_07

I tried this query but it doesn't have good performance for a large table:

select distinct p1.customerid
from trntbl p1
join trntbl p2 on p2.id <> p1.id
               and p2.customerid = p1.customerid
               and p2.TransactionDate >= p1.TransactionDate
               and p2.TransactionDate < date_add(day, 3, p1.prchasedate)
group by p1.customerid, p1.id
having count(*) >= 4

CodePudding user response:

If customers must have done transactions in three consecutive days (meaning that 5 transactions in a day then nothing in the next two days wouldn't count), then this can be done with two self joins:

with cte as
(select CustomerId, Transactiondate, count(*) ct
from table_name
group by CustomerId, Transactiondate)
select distinct t1.CustomerId
from cte t1  inner join cte t2
on t1.Transactiondate = dateadd(day, 1, t2.Transactiondate)
and t1.CustomerId = t2.CustomerId
inner join cte t3
on t2.Transactiondate = dateadd(day, 1, t3.Transactiondate)
and t3.CustomerId = t2.CustomerId
;

Fiddle

CodePudding user response:

This is actually a gaps and islands problem, you can solve by using analytic window functions to subtract sequential row_number from consecutive days and then grouping.

If you need to change the qualifying count or number of days simply amend the having clause accordingly:

with d as (
    select customerId, transactiondate, Count(*) qty
    from t
    group by CustomerId, Transactiondate
), groups as (
    select *,
     DateAdd(day, - row_number() 
      over (partition by customerid order by transactiondate), Transactiondate) as dt
    from d
)
select customerid
from groups
group by CustomerId, dt
having Count(*) <= 3 and Sum(qty) >= 5;

Example DB<>Fiddle

CodePudding user response:

Although this is a gaps-and-islands problem, there are shortcuts you can take.

You can group it up by date, then get the row 2 previous, and filter by only rows where the 2 previous row is exactly two days apart.

SELECT DISTINCT
  CustomerId
FROM (
    SELECT
      t.CustomerId,
      v.Date,
      Prev2 = LAG(v.Date, 2) OVER (PARTITION BY t.CustomerId ORDER BY v.Date)
    FROM YourTable t
    CROSS APPLY (VALUES( CAST(Transactiondate AS date) )) v(Date)
    GROUP BY
      t.CustomerId,
      v.Date
) t
WHERE DATEDIFF(day, t.Prev2, t.Date) = 2

db<>fiddle

If the base table only has a maximum of one row per date then you can forgo the GROUP BY.

CodePudding user response:

You could make use of datediff function and verify if the sum of the date differences are between 3 and 5 (provided the max of the differences is just 1) since the dates might be unique (for example customerid 2 can have transaction dates as 5,6,7,8,9 of March 2022) and this should be taken into account too.

declare @tbl table(id int identity,customerid int,transactiondate date)

insert into @tbl(customerid,transactiondate)
values(1,'2022-03-01')
,(1,'2022-03-01')
,(1,'2022-03-05')
,(1,'2022-03-07')
,(1,'2022-03-07')
,(2,'2022-03-05')
,(2,'2022-03-05')
,(2,'2022-03-06')
,(2,'2022-03-06')
,(2,'2022-03-07')

select customerid from (
select *
,SUM(datediff)over(partition by customerid order by transactiondate)[sum]
,max(datediff)over(partition by customerid order by transactiondate)[max]
from( 
select customerid , transactiondate,
DATEDIFF(DAY
, 
case when LEAD(transactiondate,1)over(partition by customerid order by transactiondate) 
is null then 
LAG(transactiondate,1,transactiondate)
over(partition by customerid order by transactiondate)
else
transactiondate end

, case when LEAD(transactiondate,1)over(partition by customerid order by transactiondate) 
is null then 
transactiondate
else
LEAD(transactiondate,1,transactiondate)
over(partition by customerid order by transactiondate)end) as [datediff]
,ROW_NUMBER()over(partition by customerid order by transactiondate)rownum
from @tbl
)t
)t1
where t1.rownum = 5
and t1.max = 1
and t1.sum between 3 and 5
  • Related