I have table1 and table2 as follows:
Table1:
col1 col2
-------------
a1 b1
a2 b1
a3 b2
a4 b3
a5 b3
a5 b4
a5 b2
Table2:
col2 col3
----------
b1 c1
b4 c2
To get all duplicate entries for col2 in table1, I have written following query:
SELECT x.col1,x.col2
FROM table1 x
JOIN (SELECT t.col2
FROM table1 t
GROUP BY t.col2
HAVING COUNT(t.col2) > 1) y ON y.col2 = x.col2
Now I want to remove the entries from above result which are in table2
Expected output:
col1 col2
----------
a3 b2
a4 b3
a5 b3
a5 b2
Query I wrote:
SELECT x.col1,x.col2
FROM table1 x
JOIN (SELECT t.col2
FROM table1 t
GROUP BY t.col2
HAVING COUNT(t.col2) > 1) y ON y.col2 = x.col2 where x.col2 not in (select col2 from table2)
I see the expected results using above query. Is there a more efficient of achieving the same result? and are there any cases that I could be missing?
Thanks
CodePudding user response:
This script leaves b4 from Table2 because b4 in col2 is not a duplicate in col2 from Table1.
DROP TABLE IF EXISTS Table1
DROP TABLE IF EXISTS Table2
CREATE TABLE Table1
(
col1 VARCHAR(10),
col2 VARCHAR(10)
)
GO
CREATE TABLE Table2
(
col2 VARCHAR(10),
col3 VARCHAR(10)
)
GO
INSERT INTO Table1
VALUES
('a1', 'b1'),
('a2', 'b1'),
('a3', 'b2'),
('a4', 'b3'),
('a5', 'b3'),
('a5', 'b4'),
('a5', 'b2')
INSERT INTO Table2
VALUES
('b1', 'c1'),
('b4', 'c2')
SELECT T2.*
FROM Table2 T2
LEFT JOIN
(
SELECT col2
FROM Table1
GROUP BY col2
HAVING COUNT(*) > 1
) T1 ON T1.col2 = T2.col2
WHERE T1.col2 IS NULL
CodePudding user response:
Depending on which DBMS you're actually using, you could use window functions and exist expressions.
SELECT
*
FROM
(
SELECT
*,
COUNT(*) OVER (PARTITION BY col2) AS occurences
FROM
table1
)
t1
WHERE
occurrences > 1
AND
NOT EXISTS (
SELECT *
FROM table2
WHERE col2 = t1.col2
)
CodePudding user response:
If your DBMS
supports CTE, Common Table Expressions
you may try the following:
with cte as (
select col1,col2, count(*) over (partition by col2) as cn from Table1
)
select T.col1,T.col2 from cte T
left join Table2 D on T.col2=D.col2
where T.cn>1 and D.col2 is null
See a demo from here.
This works on MySQL 8.0
and above and PostgreSQL
.