SQL to delete surplus rows where values are the same?-CodePudding

I have a table that looks like this:

Week | Value1 | Value2 | Value3
-----|--------|--------|-------
1    | A      | A      | 5
1    | A      | B      | 10
1    | B      | B      | 15
2    | A      | A      | 10
2    | A      | B      | 15
2    | B      | B      | 20
2    | A      | A      | 10
2    | A      | B      | 15
2    | B      | B      | 25

I want to remove the surplus rows where Week, Value1, and Value2 match, but leave the original rows intact. So the desired result would look like this:

Week | Value1 | Value2 | Value3
-----|--------|--------|-------
1    | A      | A      | 5
1    | A      | B      | 10
1    | B      | B      | 15
2    | A      | A      | 10
2    | A      | B      | 15
2    | B      | B      | 20

I can select the data I need to remove using this code:

select  *
from    (
        select  *, row_number() over(partition by week, value1, value2 order by week, value1, value2) as row
        from    mytable
        )
where   row >1

Does anyone know how best to go about actually removing these rows from the table please?

I'm on a db2 database, if that helps narrow down what functions do/don't work.

CodePudding user response：

Unfortunately, removing exact duplicates is tricky in many databases, and I think in DB2 as well. One option is to add a new column that uniquely identifies each row. Another is to recreate the table:

create temporary table temp_mytable as
    select distinct week, value1, value2
    from mytable;

truncate mytable;

insert into mytable (week, value1, value2)
    select week, value1, value2
    from temp_mytable;

If you had a unique id for each row, you would simply use:

delete from mytable
    where id <> (select min(t2.id)
                 from mytable t2
                 where t2.week = t.week and t2.value1 = t.value1 and t2.value2 = t.value2
                );

CodePudding user response：

You want to delete rows for which exists a sibling with a lower value3 (so as to only keep the row with the lowest value3).

delete from mytable
where exists
(
  select null
  from mytable better_row
  where better_row.week = mytable.week
  and better_row.value1 = mytable.value1
  and better_row.value2 = mytable.value2
  and better_row.value3 < mytable.value3
);

CodePudding user response：

Try the following. It works on Db2 for LUW at least.

DELETE FROM
(
SELECT ROW_NUMBER () OVER (PARTITION BY WEEK, VALUE1, VALUE2 ORDER BY VALUE3) RN_ 
FROM MYTABLE
)
WHERE RN_ <> 1;

CodePudding user response：

Use a DISTINCT clause.
DISTINCT clause is used to remove duplicates from the result set.