I am looking to remove the rows (highlighted in yellow) that exist prior to the FirstOrder
flag (partitioned by CustomerID
).
The business case is that I need to remove the rows for each customer that has a row before their First Order date (ASOFDATE where FirstOrder=1).
I have attempted the creation of another column that coalesces the minimum date per customer and the minimum date per FirstOrder, to create a kind of range, but where I was stymied was how to account for the NULL's in between those dates. The one idea I had was to make those in between dates the same as the first date, then I could simply remove rows where this proposed "reference date" < (asofdate where firstorder = 1), but I'm not having luck getting that to work:
I'm sure the solution is pretty simple and I'm just not seeing it. Here's the sample data:
CREATE TABLE #DATA (ASOFDATE DATE, CUSTOMERID INT, ACTIVE BIT, FIRSTORDER BIT)
INSERT INTO #DATA VALUES
('2018-01-31', 206424, NULL, NULL),
('2018-02-28', 206424, NULL, NULL),
('2018-03-31', 206424, 1, 1),
('2022-06-30', 206424, NULL, NULL),
('2022-07-31', 206424, NULL, NULL),
('2018-06-30', 247034, NULL, NULL),
('2018-07-31', 247034, NULL, NULL),
('2018-08-31', 247034, 1, 1),
('2022-05-31', 247034, NULL, NULL)
CodePudding user response:
Using a conditional sum() over() may help
Example
;with cte as (
Select *
,Flag = sum( case when FirstOrder=1 then 1 else 0 end ) over (partition by CustomerID order by asofdate)
from #Data
)
Delete from cte where Flag=0
Updated Table
ASOFDATE CUSTOMERID ACTIVE FIRSTORDER
2018-03-31 206424 1 1
2022-06-30 206424 NULL NULL
2022-07-31 206424 NULL NULL
2018-08-31 247034 1 1
2022-05-31 247034 NULL NULL
CodePudding user response:
Yet another working option is matching your table with a selection of your table where "FIRSTORDER" is true. Then remove all "CUSTOMERID" matching rows which have a lower "ASOFDATE" than the corresponding "FIRSTORDER".
DELETE #DATA
FROM #DATA
INNER JOIN (SELECT ASOFDATE, CUSTOMERID FROM #DATA WHERE FIRSTORDER = 'True') firstOrders
ON #DATA.CUSTOMERID = firstOrders.CUSTOMERID
AND #DATA.ASOFDATE < firstOrders.ASOFDATE;
Check the demo here.
CodePudding user response:
Try:
WITH StartDates AS (
SELECT CUSTOMERID, MIN(ASOFDATE) AS StartDate
FROM #DATA
WHERE FIRSTORDER = 1
GROUP BY CUSTOMERID
)
DELETE D
FROM StartDates S
JOIN #DATA D
ON D.CUSTOMERID = S.CUSTOMERID
AND D.ASOFDATE < S.StartDate
See This db<>fiddle
The CTE (common table expression) first picks the FIRSTORDER reference date for each customer, and the main query then uses that information to select the records to be deleted.