Home > database >  Find NON-duplicate column value combinations
Find NON-duplicate column value combinations

Time:05-03

This is for a migration script.

CompanyTable:

EmployeeId DivisionId
abc div1
def div1
abc div1
abc div2
xyz div2

In the below code I am Selecting duplicate EmployeeId-DivisionId combinations, that is, the records that have the same EmployeeId and DivisionId will be selected. So from the above table, the two rows that have abc-div1 combination will be selected by the below code.

How can I invert it? It seems so simple but I can't figure it out. I tried replacing with HAVING count(*) = 0 instead of > 1, I've tried fiddling with the equality signs in the ON and AND lines. Basically from the above table, I want to select the other three rows that don't have the abc-div1 combination. If there is a way to select all the unique EmployeeID-DivisionId combinations, let me know.

SELECT a.EmployeeID, a.DivisionId FROM CompanyTable a
  JOIN ( SELECT EmployeeID, DivisionId 
         FROM CompanyTable 
         GROUP BY EmployeeID, DivisionId 
         HAVING count(*) > 1 ) b
    ON a.EmployeeID = b.EmployeeID
   AND a.DivisionId = b.DivisionId;

EmployeeId and DivisionId are both nvarchar(50) columns.

CodePudding user response:

A windowed count would seem a suitable method:

select employeeid, divisionid
from (
    select *, Count(*) over(partition by employeeid, divisionid) ct
    from t
)t
where ct = 1;

CodePudding user response:

As already mentioned, you must replace > 1 by its real opposite <= 1, this works: db<>fiddle

CodePudding user response:

First, let's try rewriting your query using a common table expression (CTE), instead of a subquery:

WITH cteCompanyTableStats as (
    SELECT 
        EmployeeID, DivisionId, 
        HasDuplicates = CASE WHEN count(*) > 1 THEN1 ELSE 0 END
    FROM CompanyTable 
    GROUP BY EmployeeID, DivisionId 
)
SELECT ct.*
FROM CompanyTable ct
    inner join cteCompanyTableStats cts on
        ct.EmployeeId = cts.EmployeeId 
        and ct.DivisionId = cts.DivisionId
        and cts.HasDuplicates = 1

Notice how I've removed the HAVING clause & added a new HasDuplicates column? We're going to use that new column to find all of the table rows that -DON'T- have duplicates:

WITH cteCompanyTableStats as (
    SELECT 
        EmployeeID, DivisionId, 
        HasDuplicates = CASE WHEN count(*) > 1 THEN1 ELSE 0 END
    FROM CompanyTable 
    GROUP BY EmployeeID, DivisionId 
)
SELECT ct.*
FROM CompanyTable ct
    inner join cteCompanyTableStats cts on
        ct.EmployeeId = cts.EmployeeId 
        and ct.DivisionId = cts.DivisionId
        and cts.HasDuplicates = 0

The only character of SQL code that changed between the two queries was the last line, where and cts.HasDuplicates = ### is set.

  • Related