Remove case insensitive duplicates in sql (postgres)-CodePudding

I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo and hello.

The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question

SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
    WHERE out.text != in.text 
    AND LOWER(in.text) = LOWER(out.text) 
    AND in.created_at > out.created_at) > 1

Thanks!

CodePudding user response：

Can you try

SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text), ORDER by created_at ) as rn
FROM some_table

You can then use the rn column as a filter

CodePudding user response：

To help this query, create an expression index on LOWER(text). Include created_at in the index to help the date comparisons.

CREATE INDEX text_lower ON some_table(LOWER(text), created_at);

It's hard to test this without your data, though.