Redshift Delete Duplicate Rows-CodePudding

I want to delete duplicates from a Redshift table that are true duplicates. Below is an example of two rows that are true duplicates.

Since it is Redshift, there are no primary keys to the table. Any help is appreciated.

id	Col 1	Col 2
1	Val 1	Val 2
1	Val 1	Val 2

I tried using window functions row_number(), rank(). Neither worked as when applying Delete command, SQL command cannot differentiate both rows.

Trial 1:

The below command deletes both rows

DELETE From test_table
where (id) IN
(
  select \*,row_number() over(partition by id) as rownumber from test*table where row*number !=1
);

Trial 2:

The below command retains both rows.

DELETE From test_table
where (id) IN
(
   select \*,rank() over(partition by id) as rownumber from test*table where row*number !=1
);

CodePudding user response：

All row values are identical. Hence you unable to delete specific rows in that table.

In that I would recommend to create dummy table, and load unique records.

Steps to follow:

create table dummy as select * from main_table where 1=2
insert into dummy(col1,col2..coln) select distinct col1,col2..coln from main_table;
verify dummy table.
Alter table main_table rename to main_table_bk
alter table dummy rename to main.
after complete your testing and verification, drop main_table_bk

Hope it will help.

CodePudding user response：

You cannot delete one without deleting the other as they are identical. The way to do this is to:

make a temp table with (one copy) of each duplicate row
(within a transaction) delete all rows from the source table that match rows in the temp table
Insert temp table rows into source table (commit)