Home > other >  How to find duplicate group records in SQL
How to find duplicate group records in SQL

Time:08-24

I have a table:

Group_ID Group_Creator_ID Group_Member_ID
A01 P01 J01
A01 P01 K01
A01 P01 L01
B01 S01 X01
B01 S01 Y01
C01 P01 J01
C01 P01 K01
C01 P01 L01
D01 S01 J01
D01 S01 K01
E01 z01 J01
E01 z01 K01
E01 z01 L01

I want to display the same Group_ID where they have the exact Group_Member_ID records.

Expected result:

Group_ID
A01
C01
E01

(because they all share the same J01 K01 L01 records)

Thanks a lot !

CodePudding user response:

I would group by Group_ID and assemble all values of Group_Member_ID into a list.

SELECT Group_ID
, listagg(distinct  Group_Member_ID, ',')
WITHIN group (order by Group_Member_ID ASC) as Group_Member_ID_listagg
FROM My_First_Table
GROUP BY Group_ID

Link to SQL Generator

Once you had this, then you could query the unique combinations and then do the same thing... get a unique combination and the Group_ID as a list.

SELECT Group_Member_ID_listagg
, listagg(distinct  Group_ID, ',')
WITHIN group (order by Group_ID ASC) as Group_ID_listagg
FROM Result_From_First_step
GROUP BY Group_Member_ID_listagg

Link to SQL Generator

To tie it all together, I prefer writing with the style of CTE:

WITH CTE_LISTS AS (
  SELECT Group_ID
  , listagg(distinct  Group_Member_ID, ',')
  WITHIN group (order by Group_Member_ID ASC) as Group_Member_ID_listagg
  FROM My_First_Table
  GROUP BY Group_ID
)
SELECT Group_Member_ID_listagg
, listagg(distinct  Group_ID, ',')
WITHIN group (order by Group_ID ASC) as Group_ID_listagg
FROM CTE_LISTS
GROUP BY Group_Member_ID_listagg

You might consider tagging your post with the appropriate tag to specify what RDBMS you're using. I posted using Snowflake SQL so if you're using something different, the syntax might be slightly different.

CodePudding user response:

You can aggregate Group_Member_ID values ​​into a string for each Group_ID, and then select those Group_ID values ​​for which at least one other exists with the same Group_Member_ID value string.

For SQL Server a query can be like this

WITH group_members AS (
    SELECT 
       group_id,
       STRING_AGG(group_member_id, ',') WITHIN GROUP (ORDER BY group_member_id ASC) AS members
    FROM data
    GROUP  BY group_id
)
SELECT 
    group_id
FROM group_members gm
WHERE EXISTS (
    SELECT 1 FROM group_members gm1 WHERE gm1.group_id != gm.group_id AND gm1.members = gm.members
)

fiddle


For PostgreSQL it can be like this

WITH group_members AS (
    SELECT 
       group_id,
       STRING_AGG(group_member_id, ',' ORDER BY group_member_id ASC) AS members
    FROM data
    GROUP  BY group_id
)
SELECT 
    group_id
FROM group_members gm
WHERE EXISTS (
    SELECT 1 FROM group_members gm1 WHERE gm1.group_id != gm.group_id AND gm1.members = gm.members
)

fiddle


For MySQL 8.0 it can be like this

WITH group_members AS (
    SELECT 
       group_id,
       GROUP_CONCAT(group_member_id ORDER BY group_member_id ASC, ',' ) AS members
    FROM data 
    GROUP  BY group_id
)
SELECT 
    group_id
FROM group_members gm
WHERE EXISTS (
    SELECT 1 FROM group_members gm1 WHERE gm1.group_id != gm.group_id AND gm1.members = gm.members
)

fiddle


For older versions of MySQL it can be like this

SELECT 
    group_id
FROM (
    SELECT 
       group_id,
       GROUP_CONCAT(group_member_id ORDER BY group_member_id ASC, ',' ) AS members
    FROM data 
    GROUP  BY group_id
) gm
WHERE EXISTS (
    SELECT 1 FROM (
        SELECT 
           group_id,
           GROUP_CONCAT(group_member_id ORDER BY group_member_id ASC, ',' ) AS members
        FROM data 
        GROUP  BY group_id
    ) gm1 WHERE gm1.group_id != gm.group_id AND gm1.members = gm.members
)

fiddle


All queries produces the same results

group_id
A01
C01
E01
  • Related