I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo
and hello
.
The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question
SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
WHERE out.text != in.text
AND LOWER(in.text) = LOWER(out.text)
AND in.created_at > out.created_at) > 1
Thanks!
CodePudding user response:
Can you try
SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text), ORDER by created_at ) as rn
FROM some_table
You can then use the rn
column as a filter
CodePudding user response:
To help this query, create an expression index on LOWER(text)
. Include created_at
in the index to help the date comparisons.
CREATE INDEX text_lower ON some_table(LOWER(text), created_at);
It's hard to test this without your data, though.