Home > OS >  T-SQL Exclude scenarios where two or more records have two or more fields with the same value but on
T-SQL Exclude scenarios where two or more records have two or more fields with the same value but on

Time:02-11

Given the table below:

DECLARE @Data TABLE 
              (
                   Id INT,
                   Branch VARCHAR(50),
                   Color VARCHAR(50), 
                   Name VARCHAR(50),
                   Model VARCHAR(50)
               )

INSERT INTO @Data 
VALUES (1, 'London', 'Red', 'Mazda 3', '2015'), 
       (2, 'London', NULL, 'Mazda 3', '2015'),
       (3, 'London', 'Red', 'Mazda 2', '2014'),
       (4, 'London', 'Red', 'Fiesta', '2020'),
       (5, 'London', 'Yellow', 'Focus', '2010'),
       (6, 'London', NULL, 'Fiesta', '2020')
Id Branch Color Name Model
1 London Red Mazda 3 2015
2 London null Mazda 3 2015
3 London Red Mazda 2 2014
4 London Red Fiesta 2020
5 London Yellow Focus 2010
6 London null Fiesta 2020

I am aiming to exclude those records where the Branch, Name and Model are the same, but one of the rows has a specific Color value and the second row is null.

So, I would be only returning the id's below as the desired result, so Id's 1,2,4 and 6 should be excluded.:

Id Branch Color Name Model
4 London Blue Fiesta 2020
5 London Yellow Focus 2010

I have tried the query below:

SELECT Data1.* 
FROM @Data Data1 
INNER JOIN @Data Data2 ON Data1.Branch = Data2.Branch 
                       AND Data1.Name = Data2.Name    
                       AND Data1.Model = Data2.Model     
                       AND Data1.Color = Data2.Color

But I still get the ones where the Color is not null as shown below.

Id Branch Color Name Model
1 London Red Mazda 3 2015
3 London Red Mazda 2 2014
4 London Blue Fiesta 2020
5 London Yellow Focus 2010

Any guidance will be really appreciated !

CodePudding user response:

Not 100% sure what you are trying to achieve, but it appears you want to only list data where color is populated in each row if there is more than 1 row

Filter using Duplicate Count vs Color Count

;WITH cte_Count AS (
SELECT ID
    ,Branch
    ,Name
    ,Model
    ,DupCnt = COUNT(*) OVER (PARTITION BY Branch,Name,Model)
    ,ColorCnt = COUNT(Color) OVER (PARTITION BY Branch,Name,Model)
FROM @Data
)

SELECT *
FROM cte_Count
WHERE DupCnt = 1 /*Only 1 row for that combo of Branch-Name-Model*/
OR DupCnt = ColorCnt /*Color populated in each row*/

CodePudding user response:

Schema and insert statements:

 DECLARE @Data TABLE 
               (
                    Id INT,
                    Branch VARCHAR(50),
                    Color VARCHAR(50), 
                    Name VARCHAR(50),
                    Model VARCHAR(50)
                )
 
 INSERT INTO @Data 
 VALUES (1, 'London', 'Red', 'Mazda 3', '2015'), 
        (2, 'London', NULL, 'Mazda 3', '2015'),
        (3, 'London', 'Red', 'Mazda 2', '2014'),
        (4, 'London', 'Red', 'Fiesta', '2020'),
        (5, 'London', 'Yellow', 'Focus', '2010'),
        (6, 'London', NULL, 'Fiesta', '2020')
 
 

Query:

 Select * from @Data Data0
 where not exists(
                    select * from @Data Data1 
                    inner join @Data Data2
                    on Data1.Branch=Data2.Branch and data1.Model=data2.Model and data1.id<>data2.Id
                    where Data1.color is not null
                    and Data2.color is null and (data0.id =data1.id or data0.id=data2.Id)
                )

Output:

Id Branch Color Name Model
3 London Red Mazda 2 2014
5 London Yellow Focus 2010

db<>fiddle here

  • Related