I have a very huge table where I need to find duplicate rows. In order to find them I run:
SELECT firstcolumn, count(*) AS c FROM invoice GROUP BY firstcolumn HAVING c > 1 ORDER BY c DESC
This works as expected: I get a table with firstcolumn in one column and the total number of rows that holds firstcolumn's duplicate value.
Now, I need to add a new column to this result table: a new column "ids" which lists all the ids of the rows that have the same firstcolumn's value. So for instance, if my original table has 3 rows with the same firstcolumn value (eg. "somerandomstring") and this 3 rows have 11,32,356 as ids, I should end up with a result table like this:
"somerandomstring",3,"11,32,356"
So, how can I modify my query so that I can have the list of ids in my third column ?
I tried modifying my query as such:
SELECT firstcolumn, count(*) AS c, (SELECT secondid FROM invoice AS so WHERE so.firstcolumn = si.firstcolumn) AS secondcolumn
FROM invoice AS si
WHERE firstcolumn != ""
GROUP BY firstcolumn HAVING c > 1 ORDER BY c DESC
but it doesn't work. No error appear and the query just keeps loading
CodePudding user response:
GROUP_CONCAT to the rescue:
SELECT firstcolumn,
count(*) AS c,
group_concat(secondid) as ids
FROM invoice AS si
WHERE firstcolumn != ""
GROUP BY firstcolumn
HAVING c > 1
ORDER BY c DESC