Home > Software engineering >  Remove duplicate rows from multiple criteria using SQL queries
Remove duplicate rows from multiple criteria using SQL queries

Time:01-23

I have a table with columns Machine, Product and Sources:

Machine Product Sources
M3 H cmdd6
M3 H 91
M3 H cmdd3
M4 I cmdd7
M4 J cmdd7
M4 B 827
M4 B cmdd7

In the above table where Machine is M3 the product is same but the Sources column has multiple intake. So the requirement is to remove the duplicate rows where Sources should always be 'cmdd' in ascending order.

For example if there is duplicate with product and sources are different i.e 'cmdd6' or 'cmdd3', then duplicate row should be removed and values would remain with sources 'cmdd3'.

Below is the result table would to like to achieve

Machine Product Sources
M3 H cmdd3
M4 I cmdd7
M4 J cmdd7
M4 B cmdd7

Below is the query which I tried to remove duplicates on the values of count >1.

WITH CTE(Machine, Product,Sources, duplicatecount) AS 
(
    SELECT 
        Machine, Product, Sources, 
        ROW_NUMBER() OVER (PARTITION BY Machine, Product
                           ORDER BY Machine, Sources) AS DuplicateCount
    FROM 
        Concatcleanup 
)
DELETE FROM cte
WHERE duplicatecount > 1

Any help is highly appreciated.

CodePudding user response:

You can use one extra crafted field inside the ORDER BY clause ROW_NUMBER window function, to pull "cmdd%"-like values above all the others.

WITH cte AS (
    SELECT *,
           ROW_NUMBER() OVER(
               PARTITION BY Machine, Product
               ORDER BY CASE WHEN Sources NOT LIKE 'cmdd%' THEN 1 END,
                        Sources
           ) AS DuplicateCount                 
    FROM Concatcleanup
)
DELETE FROM cte
WHERE DuplicateCount > 1;

Check the demo here.


If you want to avoid the deletion, you can use the notation SELECT ... INTO <new_table> FROM ... and use the code for the cte:

SELECT Machine, Product, Sources,  
       ROW_NUMBER() OVER(
           PARTITION BY Machine, Product
           ORDER BY CASE WHEN Sources NOT LIKE 'cmdd%' THEN 1 END,
                    Sources
       ) AS DuplicateCount
INTO newtab
FROM Concatcleanup;

Check the demo here.

  • Related