Im trying to select *
from all duplicate rows in users, where a duplicate is defined as two users sharing the same first_name
and last_name
. (I need to process the other columns that might differ)
Im using MySQL 8.0.28.
My first try was to literally translate my requirement:
select * from `users` AS u1 where exists (select 1 from `users` AS u2 WHERE `u2`.`first_name` = `u1`.`first_name` AND `u2`.`last_name` = `u1`.`last_name` AND `u2`.`id` != `u1`.`id`)
Which, obviously, has a horrendous execution time.
My current query is
SELECT * from users where Concat(first_name," ",last_name) IN (select Concat(first_name," ",last_name) from `users` GROUP BY first_name, last_name HAVING COUNT(*)>1)
which is vastly more efficient, but still takes more than 100ms for 8000 records. I suppose a solution that doesn't use concat could benefit from indicies and would not need to calculate the result for each row.
Also, I couldn't get group by
to work because I need so select all columns of all rows that are duplicates, not just the distinct first_name
's and last_name
's. Also because I don't want to disable ONLY_FULL_GROUP_BY
(not sure if disabling that would help anyway).
Is there a more efficient, proper way to select these duplicate rows?
CodePudding user response:
I would just use an aggregation approach here:
SELECT *
FROM users
WHERE (first_name, last_name) IN (
SELECT first_name, last_name
FROM users
GROUP BY 1, 2
HAVING COUNT(*) > 1
);
On MySQL 8 , we can also use COUNT()
as an analytic function here:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY first_name, last_name) AS cnt
FROM users
)
SELECT *
FROM cte
WHERE cnt > 1;