How can I delete trailing contiguous records in a partition with a particular value?-CodePudding

I'm using the latest version of SQL Server and have the following problem. Given the table below, the requirement, quite simply, is to delete "trailing" records in each _category partition that have _value = 0. Trailing in this context means, when the records are placed in _date order, any series or contiguous block of records with _value = 0 at the end of the list should be deleted. Records with _value = 0 that have subsequent records in the partition with some non-zero value should stay.

create table #x (_id int identity, _category int, _date date, _value int)

insert into #x values (1, '2022-10-01', 12)
insert into #x values (1, '2022-10-03', 0)
insert into #x values (1, '2022-10-04', 10)
insert into #x values (1, '2022-10-06', 11)
insert into #x values (1, '2022-10-07', 10)

insert into #x values (2, '2022-10-01', 1)
insert into #x values (2, '2022-10-02', 0)
insert into #x values (2, '2022-10-05', 19)
insert into #x values (2, '2022-10-10', 18)
insert into #x values (2, '2022-10-12', 0)
insert into #x values (2, '2022-10-13', 0)
insert into #x values (2, '2022-10-15', 0)

insert into #x values (3, '2022-10-02', 10)
insert into #x values (3, '2022-10-03', 0)
insert into #x values (3, '2022-10-05', 0)
insert into #x values (3, '2022-10-06', 12)
insert into #x values (3, '2022-10-08', 0)

I see a few ways to do it. The brute force way is to to run the records through a cursor in date order, and grab the ID of any record where _value = 0 and see if it holds until the category changes. I'm trying to avoid T-SQL though if I can do it in a query.

To that end, I thought I could apply some gaps and islands trickery and do something with window functions. I feel like there might be a way to leverage last_value() for this, but so far I only see it useful in identifying partitions that have the criteria, not so much in helping me get the ID's of the records to delete.

The desired result is the deletion of records 10, 11, 12 and 17, as shown below.

Appreciate any help.

CodePudding user response：

I'm not sure that your requirement requires a gaps and islands approach. Simple exists logic should work.

SELECT _id, _catrgory, _date, _value
FROM #x x1
WHERE _value <> 0 OR
    EXISTS (
        SELECT 1
        FROM #x x2
        WHERE x2._category = x1._category AND
              x2._date > x1._date AND
              x2._value <> 0
    );

CodePudding user response：

Using common table expressions, you can use:

WITH CTE_NumberedRows AS (
    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY _category ORDER BY _date)
    FROM #x
),
CTE_Keepers AS (
    SELECT _category, rnLastKeeper = MAX(rn)
    FROM CTE_NumberedRows
    WHERE _value <> 0
    GROUP BY _category
)
DELETE NR
FROM CTE_NumberedRows NR
LEFT JOIN CTE_Keepers K
    ON K._category = NR._category
WHERE NR.rn > ISNULL(K.rnLastKeeper, 0)

See this db<>fiddle for a working demo.

EDIT: My original post did not handle the all-zero's edge case. This has been corrected above, together with some naming tweaks. (The original can still be found here.

Tim Biegeleisen's post may be the simpler approach.

CodePudding user response：

Assuming that all _values are greater than or equal to 0 you can use MAX() window function in an updatable CTE:

WITH cte AS (
  SELECT *, 
         MAX(_value) OVER (
           PARTITION BY _category 
           ORDER BY _date 
           ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
         ) max
  FROM #x
)  
DELETE FROM cte
WHERE max = 0;

See the demo.