Filtering Data Based on Whether Zeroes Ever Appear (Across Many Rows)-CodePudding

Consider the following table.

CREATE TABLE my_table
(
    cost decimal(10,2),
    ref int
);

INSERT INTO my_table(cost, ref)
VALUES      (0.00, 1),
            (0.50, 1),
            (-1.89, 1),
            (0.00, 2),
            (0.00, 2),
            (0.00, 2),
            (1.23, 3),
            (-9.47, 3),
            (111.23, 3),
            (12.00, 3);

My aim with this table is to filter it to remove all rows where ref = 2 (rows 4, 5, and 6 in the above), because all of the cost values where ref = 2 are zero.

I've got halfway there by working out which values of ref have AT LEAST one row where cost is not zero:

SELECT DISTINCT ref
INTO            refs_to_keep
FROM            my_table
WHERE           cost != 0.00;

So what I'd like then like to do to finish the job is something like

DELETE FROM my_table
WHERE ref NOT IN "list of refs in the refs_to_keep table";

But I don't know the syntax for that. Can anyone help me out please?

Or maybe there's another way to achieve this without the intermediate refs_to_keep stage?

CodePudding user response：

You can use the windowed function SUM to find the rows with a total of 0:

DECLARE @my_table TABLE (cost decimal(10,2), ref int);

INSERT INTO @my_table(cost, ref) VALUES      (0.00, 1),
(0.50, 1), (1.89, 1), (0.00, 2), (0.00, 2),
(0.00, 2), (1.23, 3), (9.47, 3), (111.23, 3),
(12.00, 3);

DELETE a
  FROM (
        SELECT *, SUM(cost) OVER (PARTITION BY ref ORDER BY (SELECT 1)) as sumCost
          FROM @my_table
       ) a
 WHERE sumCost = 0;

SELECT *
  FROM @my_table;

cost	ref
0.00	1
0.50	1
1.89	1
1.23	3
9.47	3
111.23	3
12.00	3

Here we're using the windowed SUM function to find the rows with a total of 0 (ORDER BY (SELECT 1) is a kludge to total each row). Once we know the rows, they can be deleted.

Edit:

DECLARE @my_table TABLE (cost decimal(10,2), ref int);

INSERT INTO @my_table(cost, ref) VALUES      (0.00, 1),
(0.50, 1), (1.89, 1), (0.00, 2), (0.00, 2),
(0.00, 2), (1.23, 3), (-9.47, 3), (111.23, 3),
(12.00, 3), (10,4), (-10,4);

DELETE a
  FROM (
        SELECT *, MAX(cost) OVER (PARTITION BY ref ORDER BY (SELECT 1)) as maCost, MIN(cost) OVER (PARTITION BY ref ORDER BY (SELECT 1)) as miCost
          FROM @my_table
       ) a
 WHERE miCost = 0
   AND maCost = 0;

SELECT *
  FROM @my_table;

cost	ref
0.00	1
0.50	1
1.89	1
1.23	3
-9.47	3
111.23	3
12.00	3
10.00	4
-10.00	4

This now looks for a MIN and MAX of 0, using a similar technique. I added a 4th set of values where one was 10 and the other -10 to provide an example of a set which would have been deleted using the previous method.

CodePudding user response：

Yeah you are pretty much there already although you don't need to create the temp table you can just use a sub-query:

delete from my_table
where ref not in 
(
  select distinct ref
  from my_table
  where cost <> 0
)

You can see it working here: https://www.db-fiddle.com/f/fAYuenfopm4BkYFRNMKY9w/0

CodePudding user response：

Presumably you could simply delete rows where there are no non-zero lines for a ref?

delete from t
from my_table t
where not exists (
  select * from my_table t2 
  where t2.ref = t.ref and t2.cost != 0
);

See fiddle