T-SQL Exclude scenarios where two or more records have two or more fields with the same value but on-CodePudding

Given the table below:

DECLARE @Data TABLE 
              (
                   Id INT,
                   Branch VARCHAR(50),
                   Color VARCHAR(50), 
                   Name VARCHAR(50),
                   Model VARCHAR(50)
               )

INSERT INTO @Data 
VALUES (1, 'London', 'Red', 'Mazda 3', '2015'), 
       (2, 'London', NULL, 'Mazda 3', '2015'),
       (3, 'London', 'Red', 'Mazda 2', '2014'),
       (4, 'London', 'Red', 'Fiesta', '2020'),
       (5, 'London', 'Yellow', 'Focus', '2010'),
       (6, 'London', NULL, 'Fiesta', '2020')

Id	Branch	Color	Name	Model
1	London	Red	Mazda 3	2015
2	London	null	Mazda 3	2015
3	London	Red	Mazda 2	2014
4	London	Red	Fiesta	2020
5	London	Yellow	Focus	2010
6	London	null	Fiesta	2020

I am aiming to exclude those records where the Branch, Name and Model are the same, but one of the rows has a specific Color value and the second row is null.

So, I would be only returning the id's below as the desired result, so Id's 1,2,4 and 6 should be excluded.:

Id	Branch	Color	Name	Model
4	London	Blue	Fiesta	2020
5	London	Yellow	Focus	2010

I have tried the query below:

SELECT Data1.* 
FROM @Data Data1 
INNER JOIN @Data Data2 ON Data1.Branch = Data2.Branch 
                       AND Data1.Name = Data2.Name    
                       AND Data1.Model = Data2.Model     
                       AND Data1.Color = Data2.Color

But I still get the ones where the Color is not null as shown below.

Id	Branch	Color	Name	Model
1	London	Red	Mazda 3	2015
3	London	Red	Mazda 2	2014
4	London	Blue	Fiesta	2020
5	London	Yellow	Focus	2010

Any guidance will be really appreciated !

CodePudding user response：

Not 100% sure what you are trying to achieve, but it appears you want to only list data where color is populated in each row if there is more than 1 row

Filter using Duplicate Count vs Color Count

;WITH cte_Count AS (
SELECT ID
    ,Branch
    ,Name
    ,Model
    ,DupCnt = COUNT(*) OVER (PARTITION BY Branch,Name,Model)
    ,ColorCnt = COUNT(Color) OVER (PARTITION BY Branch,Name,Model)
FROM @Data
)

SELECT *
FROM cte_Count
WHERE DupCnt = 1 /*Only 1 row for that combo of Branch-Name-Model*/
OR DupCnt = ColorCnt /*Color populated in each row*/

CodePudding user response：

Schema and insert statements:

 DECLARE @Data TABLE 
               (
                    Id INT,
                    Branch VARCHAR(50),
                    Color VARCHAR(50), 
                    Name VARCHAR(50),
                    Model VARCHAR(50)
                )
 
 INSERT INTO @Data 
 VALUES (1, 'London', 'Red', 'Mazda 3', '2015'), 
        (2, 'London', NULL, 'Mazda 3', '2015'),
        (3, 'London', 'Red', 'Mazda 2', '2014'),
        (4, 'London', 'Red', 'Fiesta', '2020'),
        (5, 'London', 'Yellow', 'Focus', '2010'),
        (6, 'London', NULL, 'Fiesta', '2020')

Query:

 Select * from @Data Data0
 where not exists(
                    select * from @Data Data1 
                    inner join @Data Data2
                    on Data1.Branch=Data2.Branch and data1.Model=data2.Model and data1.id<>data2.Id
                    where Data1.color is not null
                    and Data2.color is null and (data0.id =data1.id or data0.id=data2.Id)
                )

Output:

Id	Branch	Color	Name	Model
3	London	Red	Mazda 2	2014
5	London	Yellow	Focus	2010

db<>fiddle here