Home > other >  Filtering Data Based on Whether Zeroes Ever Appear (Across Many Rows)
Filtering Data Based on Whether Zeroes Ever Appear (Across Many Rows)

Time:11-05

Consider the following table.

CREATE TABLE my_table
(
    cost decimal(10,2),
    ref int
);

INSERT INTO my_table(cost, ref)
VALUES      (0.00, 1),
            (0.50, 1),
            (-1.89, 1),
            (0.00, 2),
            (0.00, 2),
            (0.00, 2),
            (1.23, 3),
            (-9.47, 3),
            (111.23, 3),
            (12.00, 3);

My aim with this table is to filter it to remove all rows where ref = 2 (rows 4, 5, and 6 in the above), because all of the cost values where ref = 2 are zero.

I've got halfway there by working out which values of ref have AT LEAST one row where cost is not zero:

SELECT DISTINCT ref
INTO            refs_to_keep
FROM            my_table
WHERE           cost != 0.00;

So what I'd like then like to do to finish the job is something like

DELETE FROM my_table
WHERE ref NOT IN "list of refs in the refs_to_keep table";

But I don't know the syntax for that. Can anyone help me out please?

Or maybe there's another way to achieve this without the intermediate refs_to_keep stage?

CodePudding user response:

You can use the windowed function SUM to find the rows with a total of 0:

DECLARE @my_table TABLE (cost decimal(10,2), ref int);

INSERT INTO @my_table(cost, ref) VALUES      (0.00, 1),
(0.50, 1), (1.89, 1), (0.00, 2), (0.00, 2),
(0.00, 2), (1.23, 3), (9.47, 3), (111.23, 3),
(12.00, 3);

DELETE a
  FROM (
        SELECT *, SUM(cost) OVER (PARTITION BY ref ORDER BY (SELECT 1)) as sumCost
          FROM @my_table
       ) a
 WHERE sumCost = 0;

SELECT *
  FROM @my_table;
cost ref
0.00 1
0.50 1
1.89 1
1.23 3
9.47 3
111.23 3
12.00 3

Here we're using the windowed SUM function to find the rows with a total of 0 (ORDER BY (SELECT 1) is a kludge to total each row). Once we know the rows, they can be deleted.

Edit:

DECLARE @my_table TABLE (cost decimal(10,2), ref int);

INSERT INTO @my_table(cost, ref) VALUES      (0.00, 1),
(0.50, 1), (1.89, 1), (0.00, 2), (0.00, 2),
(0.00, 2), (1.23, 3), (-9.47, 3), (111.23, 3),
(12.00, 3), (10,4), (-10,4);

DELETE a
  FROM (
        SELECT *, MAX(cost) OVER (PARTITION BY ref ORDER BY (SELECT 1)) as maCost, MIN(cost) OVER (PARTITION BY ref ORDER BY (SELECT 1)) as miCost
          FROM @my_table
       ) a
 WHERE miCost = 0
   AND maCost = 0;

SELECT *
  FROM @my_table;
cost ref
0.00 1
0.50 1
1.89 1
1.23 3
-9.47 3
111.23 3
12.00 3
10.00 4
-10.00 4

This now looks for a MIN and MAX of 0, using a similar technique. I added a 4th set of values where one was 10 and the other -10 to provide an example of a set which would have been deleted using the previous method.

CodePudding user response:

Yeah you are pretty much there already although you don't need to create the temp table you can just use a sub-query:

delete from my_table
where ref not in 
(
  select distinct ref
  from my_table
  where cost <> 0
)

You can see it working here: https://www.db-fiddle.com/f/fAYuenfopm4BkYFRNMKY9w/0

CodePudding user response:

Presumably you could simply delete rows where there are no non-zero lines for a ref?

delete from t
from my_table t
where not exists (
  select * from my_table t2 
  where t2.ref = t.ref and t2.cost != 0
);

See fiddle

  • Related