Delete first occurrence duplicate row in postgres-CodePudding

I have the following table

ID   VALUE1  VALUE2
1    aaa     bbb
1    aaa     bbb

Sadly, my table has no primmary keys and I want to create it. But first I need to delete only the first occurrance of duplicated rows. I don't have any unique identifier, the rows are exactly equal.

I know that I can use something like this if I have an unique identifier:

DELETE FROM
    table
WHERE
    table.id < table.id

But, what is the best approach to this if I don't have a unique identifier?

My table has 500 million lines, about 100k are duplicated.

CodePudding user response：

If I understand the question correctly, you can proceed using Row_Number():

with dataset as (select 1 as ID, 'aaa' as VALUE1, 'bbb' as VALUE2
     union all select 1, 'aaa', 'bbb'
     union all select 2, 'ccc', 'ddd'
     union all select 2, 'ccc', 'ddd')

select *, row_number() over (partition by id, value1, value2) from dataset;

This will create the following column row_number:

 id | value1 | value2 | row_number 
---- -------- -------- ------------
  1 | aaa    | bbb    |          1
  1 | aaa    | bbb    |          2
  2 | ccc    | ddd    |          1
  2 | ccc    | ddd    |          2

and then you can delete * where row_number = 1;