Home > Software engineering >  Remove duplicates based on multiple columns and grouping by different conditions
Remove duplicates based on multiple columns and grouping by different conditions

Time:10-07

Sample Data:

| Plant | Part | Action |
| ----- | ---- | ------ |
| 101 | 123 | Action1 |
| 101 | 123 | Action2 |
| 101 | 123 | Action3 |
| 101 | 123 | Action4 |
| 101 | 124 | Action2 |
| 101 | 124 | Action4 |
| 101 | 125 | Action1 |
| 101 | 125 | Action2 |

In the above sample data, I want to delete the rows with Action1, Action2, and Action3 where the same Plant-Part combinations appear in Action4. This is simplified from the actual use case for clarity (The names of the actions do not have numbers). If the plant-part is not in Action4 then it should remain in whatever Actions (1,2,3) it was in before. The plant-part combinations could be in all Actions or none or any combination in between.

Desired Result:

| Plant | Part | Action |
| ----- | ---- | ------ |
| 101 | 123 | Action4 |
| 101 | 124 | Action4 |
| 101 | 125 | Action1 |
| 101 | 125 | Action2 |

I feel like this is relatively simple, but I can't seem to get the grouping right or wrap my head around it.

CodePudding user response:

We mark every time action = 'action4', use count() over() partitioned by plant and part to make groups out of them. If action4 is in the group we present that one and if not we present the entire group.

select  *
from    (
        select   Plant
                ,Part
                ,case when count(case when action = 'action4' then 1 end) over(partition by plant, part) < 1 then action when action = 'action4' then action end as Action 
        from    t
        ) t
where   action is not null
Plant Part Action
101 123 Action4
101 124 Action4
101 125 Action1
101 125 Action2

Fiddle

CodePudding user response:

This would seem to require a simple correlation using exists:

delete from t
where exists (
    select * from t t2 
    where t2.plant = t.plant and t2.part = t.part 
    and t2.action = 'action4' 
    and t.action in ('Action1','Action2','Action3')
);

See result in DB<>Fiddle

CodePudding user response:

DELETE t1
FROM #TestTable t1 
INNER JOIN (
    SELECT *
    FROM #TestTable
    WHERE [Action] IN ('Action4') -- I used 'IN' because there's more than just this
    ) t2 ON t1.Plant = t2.Plant AND t1.PartNumber = t2.PartNumber
WHERE t1.[Action] IN ('Action1', 'Action2', 'Action3')

This is what I ended up doing, probably a more elegant way...

  • Related