I have a table with columns Machine
, Product
and Sources
:
Machine | Product | Sources |
---|---|---|
M3 | H | cmdd6 |
M3 | H | 91 |
M3 | H | cmdd3 |
M4 | I | cmdd7 |
M4 | J | cmdd7 |
M4 | B | 827 |
M4 | B | cmdd7 |
In the above table where Machine
is M3 the product is same but the Sources
column has multiple intake. So the requirement is to remove the duplicate rows where Sources
should always be 'cmdd' in ascending order.
For example if there is duplicate with product and sources are different i.e 'cmdd6' or 'cmdd3', then duplicate row should be removed and values would remain with sources 'cmdd3'.
Below is the result table would to like to achieve
Machine | Product | Sources |
---|---|---|
M3 | H | cmdd3 |
M4 | I | cmdd7 |
M4 | J | cmdd7 |
M4 | B | cmdd7 |
Below is the query which I tried to remove duplicates on the values of count >1.
WITH CTE(Machine, Product,Sources, duplicatecount) AS
(
SELECT
Machine, Product, Sources,
ROW_NUMBER() OVER (PARTITION BY Machine, Product
ORDER BY Machine, Sources) AS DuplicateCount
FROM
Concatcleanup
)
DELETE FROM cte
WHERE duplicatecount > 1
Any help is highly appreciated.
CodePudding user response:
You can use one extra crafted field inside the ORDER BY
clause ROW_NUMBER
window function, to pull "cmdd%"-like values above all the others.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY Machine, Product
ORDER BY CASE WHEN Sources NOT LIKE 'cmdd%' THEN 1 END,
Sources
) AS DuplicateCount
FROM Concatcleanup
)
DELETE FROM cte
WHERE DuplicateCount > 1;
Check the demo here.
If you want to avoid the deletion, you can use the notation SELECT ... INTO <new_table> FROM ...
and use the code for the cte:
SELECT Machine, Product, Sources,
ROW_NUMBER() OVER(
PARTITION BY Machine, Product
ORDER BY CASE WHEN Sources NOT LIKE 'cmdd%' THEN 1 END,
Sources
) AS DuplicateCount
INTO newtab
FROM Concatcleanup;
Check the demo here.