How to get duplicate records in one table which are not in other table?-CodePudding

I have table1 and table2 as follows:

Table1:

col1  col2
-------------
a1    b1
a2    b1
a3    b2
a4    b3
a5    b3
a5    b4
a5    b2

Table2:

col2  col3
----------
b1    c1
b4    c2

To get all duplicate entries for col2 in table1, I have written following query:

SELECT x.col1,x.col2
  FROM table1 x
  JOIN (SELECT t.col2
          FROM table1 t
      GROUP BY t.col2
        HAVING COUNT(t.col2) > 1) y ON y.col2 = x.col2

Now I want to remove the entries from above result which are in table2

Expected output:

col1  col2
----------
a3    b2
a4    b3
a5    b3
a5    b2

Query I wrote:

SELECT x.col1,x.col2
  FROM table1 x
  JOIN (SELECT t.col2
          FROM table1 t
      GROUP BY t.col2
        HAVING COUNT(t.col2) > 1) y ON y.col2 = x.col2 where x.col2 not in (select col2 from table2)

I see the expected results using above query. Is there a more efficient of achieving the same result? and are there any cases that I could be missing?

Thanks

CodePudding user response：

This script leaves b4 from Table2 because b4 in col2 is not a duplicate in col2 from Table1.

DROP TABLE IF EXISTS Table1
DROP TABLE IF EXISTS Table2

CREATE TABLE Table1
(
    col1 VARCHAR(10),
    col2 VARCHAR(10)
)
GO

CREATE TABLE Table2
(
    col2 VARCHAR(10),
    col3 VARCHAR(10)
)
GO

INSERT INTO Table1
VALUES
    ('a1', 'b1'),
    ('a2', 'b1'),
    ('a3', 'b2'),
    ('a4', 'b3'),
    ('a5', 'b3'),
    ('a5', 'b4'),
    ('a5', 'b2')

INSERT INTO Table2
VALUES
    ('b1', 'c1'),
    ('b4', 'c2')

SELECT T2.*
FROM Table2 T2
LEFT JOIN
(
    SELECT col2
    FROM Table1
    GROUP BY col2
    HAVING COUNT(*) > 1
) T1 ON T1.col2 = T2.col2
WHERE T1.col2 IS NULL

CodePudding user response：

Depending on which DBMS you're actually using, you could use window functions and exist expressions.

SELECT
  *
FROM
(
  SELECT
    *,
    COUNT(*) OVER (PARTITION BY col2) AS occurences
  FROM
    table1
)
  t1
WHERE
  occurrences > 1
  AND
  NOT EXISTS (
    SELECT *
      FROM table2
     WHERE col2 = t1.col2
  )

CodePudding user response：

If your DBMS supports CTE, Common Table Expressions you may try the following:

with cte as (
    select col1,col2, count(*) over (partition by col2) as cn from Table1
)
select T.col1,T.col2 from cte T
left join Table2 D on T.col2=D.col2
where T.cn>1 and D.col2 is null

See a demo from here. This works on MySQL 8.0 and above and PostgreSQL.