I have a table that looks like this:
Week | Value1 | Value2 | Value3
-----|--------|--------|-------
1 | A | A | 5
1 | A | B | 10
1 | B | B | 15
2 | A | A | 10
2 | A | B | 15
2 | B | B | 20
2 | A | A | 10
2 | A | B | 15
2 | B | B | 25
I want to remove the surplus rows where Week, Value1, and Value2 match, but leave the original rows intact. So the desired result would look like this:
Week | Value1 | Value2 | Value3
-----|--------|--------|-------
1 | A | A | 5
1 | A | B | 10
1 | B | B | 15
2 | A | A | 10
2 | A | B | 15
2 | B | B | 20
I can select the data I need to remove using this code:
select *
from (
select *, row_number() over(partition by week, value1, value2 order by week, value1, value2) as row
from mytable
)
where row >1
Does anyone know how best to go about actually removing these rows from the table please?
I'm on a db2 database, if that helps narrow down what functions do/don't work.
CodePudding user response:
Unfortunately, removing exact duplicates is tricky in many databases, and I think in DB2 as well. One option is to add a new column that uniquely identifies each row. Another is to recreate the table:
create temporary table temp_mytable as
select distinct week, value1, value2
from mytable;
truncate mytable;
insert into mytable (week, value1, value2)
select week, value1, value2
from temp_mytable;
If you had a unique id for each row, you would simply use:
delete from mytable
where id <> (select min(t2.id)
from mytable t2
where t2.week = t.week and t2.value1 = t.value1 and t2.value2 = t.value2
);
CodePudding user response:
You want to delete rows for which exists a sibling with a lower value3 (so as to only keep the row with the lowest value3).
delete from mytable
where exists
(
select null
from mytable better_row
where better_row.week = mytable.week
and better_row.value1 = mytable.value1
and better_row.value2 = mytable.value2
and better_row.value3 < mytable.value3
);
CodePudding user response:
Try the following. It works on Db2 for LUW at least.
DELETE FROM
(
SELECT ROW_NUMBER () OVER (PARTITION BY WEEK, VALUE1, VALUE2 ORDER BY VALUE3) RN_
FROM MYTABLE
)
WHERE RN_ <> 1;
CodePudding user response:
Use a DISTINCT
clause.
DISTINCT clause is used to remove duplicates from the result set.