I'm having great difficulty writing this query and cannot find any answers online which could be applied to my problem.
I have a couple of tables which looks similar to the below with. Each purchase date corresponds with an item purchased.
Cust_ID | Purchase_Date |
---|---|
123 | 08/01/2022 |
123 | 08/20/2022 |
123 | 09/05/2022 |
123 | 10/08/2022 |
123 | 12/25/2022 |
123 | 01/26/2023 |
The result I am looking for should contain the customers ID, a range of the purchases, the number of consecutive months they had made a purchase (regardless of which day they purchased), and a count of how many purchases they had made in the time frame. The result should look something like the below for my example.
Cust_ID | Min Purchase Date | Max Purchase Date | Consecutive Months | No. Items Purchased |
---|---|---|---|---|
123 | 08/01/2022 | 10/08/2022 | 3 | 4 |
123 | 12/25/2022 | 01/26/2023 | 2 | 2 |
I have tried using CTEs with querys similar to
WITH CTE as
(
SELECT
PaymentDate PD,
CustomerID CustID,
DATEADD(m, -ROW_NUMBER() OVER (PARTITION BY c.CustomerID ORDER BY
DATEPART(m,PaymentDate)), PaymentDate) as TempCol1,
FROM customers as c
LEFT JOIN payments as p on c.customerid = p.customerid
GROUP BY c.CustomerID, p.PaymentDate
)
SELECT
CustID,
MIN(PD) AS MinPaymentDate,
MAX(PD) AS MaxPaymentDate,
COUNT(*) as ConsecutiveMonths,
FROM CTE
GROUP BY CustID, TempCol1
However, the above failed to properly count consecutive months. When the payment dates matched a month apart (e.g. 1/1/22 - 2/1/22), the query properly counts the consecutive months. However, if the dates do not match from month to month (e.g. 1/5/22 - 2/15/22), the count breaks.
Any guidance/help would be much appreciated!
CodePudding user response:
This is just a small enhancement on the answer already given by ahmed. If your date range for this query is more than a year, then year(M.Purchase_Date) month(M.Purchase_Date)
will be 2024
for both 2022-02-01
and 2023-01-01
as YEAR() and MONTH() both return integer values. This will return incorrect count of consecutive months. You can change this to use CONCAT() or FORMAT(). Also, the COUNT(*)
for ItemsPurchased
should be counting the right hand side of the join, as it is a LEFT JOIN.
WITH consecutive_months AS
(
SELECT *,
DATEADD(
month,
-DENSE_RANK() OVER (
PARTITION BY CustomerID
ORDER BY YEAR(PaymentDate), MONTH(PaymentDate)
),
PaymentDate
) AS grp_date
FROM payments
)
SELECT
C.CustomerID AS CustID,
MIN(M.PaymentDate) AS MinPaymentDate,
MAX(M.PaymentDate) AS MaxPaymentDate,
COUNT(DISTINCT FORMAT(M.PaymentDate, 'yyyyMM')) AS ConsecutiveMonths,
COUNT(M.CustomerID) AS ItemsPurchased
FROM customers C
LEFT JOIN consecutive_months M
ON C.CustomerID = M.CustomerID
GROUP BY C.CustomerID, YEAR(M.grp_date), MONTH(M.grp_date)
Here's a db<>fiddle
CodePudding user response:
You need to use the dense_rank
function instead of the row_number
, this will give the same rank for the same months and avoid breaking the grouping column. Also, you need to aggregate for 'year-month' of the grouping date column.
with consecutive_months as
(
select *,
Purchase_Date - interval
dense_rank() over (partition by Cust_ID order by year(Purchase_Date), month(Purchase_Date))
month as grp_date
from payments
)
select C.Cust_ID,
min(M.Purchase_Date) as MinPurchaseDate,
max(M.Purchase_Date) as MaxPurchaseDate,
count(distinct year(M.Purchase_Date), month(M.Purchase_Date)) as ConsecutiveMonthsNo,
count(*) as ItemsPurchased
from customers C left join consecutive_months M
on C.Cust_ID = M.Cust_ID
group by C.Cust_ID, year(M.grp_date), month(M.grp_date)
See demo on MySQL
You tagged your question with MySQL, while it seems that you posted an SQL Server query syntax, for SQL Server just use dateadd(month, -dense_rank() over (partition by Cust_ID order by year(Purchase_Date), month(Purchase_Date)), Purchase_Date)
.
See demo on SQL Server.