I have a table in SQL Server that contains customers' transactions From 2022-02-10 to 2022-03-10.
I want to find customers that have at least 5 transactions on At most three consecutive days
For example, output of below table should be CustomerId = 2
Id | CustomerId | Transactiondate |
---|---|---|
1 | 1 | 2022-03-01 |
2 | 1 | 2022_03_01 |
3 | 1 | 2022_03_05 |
4 | 1 | 2022_03_07 |
5 | 1 | 2022_03_07 |
6 | 2 | 2022_03_05 |
7 | 2 | 2022_03_05 |
8 | 2 | 2022_03_06 |
9 | 2 | 2022_03_06 |
10 | 2 | 2022_03_07 |
I tried this query but it doesn't have good performance for a large table:
select distinct p1.customerid
from trntbl p1
join trntbl p2 on p2.id <> p1.id
and p2.customerid = p1.customerid
and p2.TransactionDate >= p1.TransactionDate
and p2.TransactionDate < date_add(day, 3, p1.prchasedate)
group by p1.customerid, p1.id
having count(*) >= 4
CodePudding user response:
If customers must have done transactions in three consecutive days (meaning that 5 transactions in a day then nothing in the next two days wouldn't count), then this can be done with two self joins:
with cte as
(select CustomerId, Transactiondate, count(*) ct
from table_name
group by CustomerId, Transactiondate)
select distinct t1.CustomerId
from cte t1 inner join cte t2
on t1.Transactiondate = dateadd(day, 1, t2.Transactiondate)
and t1.CustomerId = t2.CustomerId
inner join cte t3
on t2.Transactiondate = dateadd(day, 1, t3.Transactiondate)
and t3.CustomerId = t2.CustomerId
;
CodePudding user response:
This is actually a gaps and islands problem, you can solve by using analytic window functions to subtract sequential row_number from consecutive days and then grouping.
If you need to change the qualifying count or number of days simply amend the having clause accordingly:
with d as (
select customerId, transactiondate, Count(*) qty
from t
group by CustomerId, Transactiondate
), groups as (
select *,
DateAdd(day, - row_number()
over (partition by customerid order by transactiondate), Transactiondate) as dt
from d
)
select customerid
from groups
group by CustomerId, dt
having Count(*) <= 3 and Sum(qty) >= 5;
CodePudding user response:
Although this is a gaps-and-islands problem, there are shortcuts you can take.
You can group it up by date, then get the row 2 previous, and filter by only rows where the 2 previous row is exactly two days apart.
SELECT DISTINCT
CustomerId
FROM (
SELECT
t.CustomerId,
v.Date,
Prev2 = LAG(v.Date, 2) OVER (PARTITION BY t.CustomerId ORDER BY v.Date)
FROM YourTable t
CROSS APPLY (VALUES( CAST(Transactiondate AS date) )) v(Date)
GROUP BY
t.CustomerId,
v.Date
) t
WHERE DATEDIFF(day, t.Prev2, t.Date) = 2
If the base table only has a maximum of one row per date then you can forgo the GROUP BY
.
CodePudding user response:
You could make use of datediff function and verify if the sum of the date differences are between 3 and 5 (provided the max of the differences is just 1) since the dates might be unique (for example customerid 2 can have transaction dates as 5,6,7,8,9 of March 2022) and this should be taken into account too.
declare @tbl table(id int identity,customerid int,transactiondate date)
insert into @tbl(customerid,transactiondate)
values(1,'2022-03-01')
,(1,'2022-03-01')
,(1,'2022-03-05')
,(1,'2022-03-07')
,(1,'2022-03-07')
,(2,'2022-03-05')
,(2,'2022-03-05')
,(2,'2022-03-06')
,(2,'2022-03-06')
,(2,'2022-03-07')
select customerid from (
select *
,SUM(datediff)over(partition by customerid order by transactiondate)[sum]
,max(datediff)over(partition by customerid order by transactiondate)[max]
from(
select customerid , transactiondate,
DATEDIFF(DAY
,
case when LEAD(transactiondate,1)over(partition by customerid order by transactiondate)
is null then
LAG(transactiondate,1,transactiondate)
over(partition by customerid order by transactiondate)
else
transactiondate end
, case when LEAD(transactiondate,1)over(partition by customerid order by transactiondate)
is null then
transactiondate
else
LEAD(transactiondate,1,transactiondate)
over(partition by customerid order by transactiondate)end) as [datediff]
,ROW_NUMBER()over(partition by customerid order by transactiondate)rownum
from @tbl
)t
)t1
where t1.rownum = 5
and t1.max = 1
and t1.sum between 3 and 5