Home > Enterprise >  How to delete duplicate records based on two unique columns in RedShift
How to delete duplicate records based on two unique columns in RedShift

Time:09-16

I have the following table redshift.

guest_id name rownum
1 Safvan 1
1 Safvan 2
1 Thomas 3
2 Anandu 1
2 Manish 2

I need to delete all the records in each partition based on guest_id except the record having max(rownum).

The result should be like

guest_id name rownum
1 Thomas 3
2 Manish 2

Thanks in advance for valuable helps.

CodePudding user response:

Subquery returns guest_id wise max row then JOIN with main table where matching guest_id and max_row not equal row_num then perform DELETE.

DELETE redshift
FROM redshift r
INNER JOIN (SELECT guest_id
                 , MAX(rownum) rownum
           FROM redshift
           GROUP BY guest_id) t
        ON r.guest_id = t.guest_id
       AND r.rownum != t.rownum

Please check from https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=47081e3517000949460932808ac9f09d

Delete duplicate records by using CTE

WITH t_cte AS (
       SELECT *
            , ROW_NUMBER() OVER (PARTITION BY guest_id ORDER BY rownum DESC) row_num
       FROM redshift
)
DELETE
FROM t_cte
WHERE row_num > 1

Please check from url https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=86afb821ec0fe255333fd1cd01625f5a

CodePudding user response:

My solution is :

create table table_rownum as (
select
    *,
    row_number() over (partition by guest_id
order by
    rownum desc) as rownum_temp
from
    table_orig);

delete from table_rownum where rownum_temp<>1;

alter table table_rownum drop column rownum_temp;

truncate table table_orig;

insert into table_orig (select * from table_rownum);

drop table table_rownum;

Please suggest if there is better solution.

CodePudding user response:

Delete from table name where rownum not in (select max(rownum) from table name groupby column)

  • Related