Home > database >  Remove slightly different duplicates
Remove slightly different duplicates

Time:12-08

The goal is to not show duplicate with slight differences. Here is an example:

I want to remove what is in green

enter image description here

This is the query I am currently using, but not quite accurate.

SELECT Package
FROM myTable
WHERE Package NOT IN (SELECT Package FROM myTable WHERE Package NOT LIKE '%_AC')

CodePudding user response:

With just your sample data to go on you could try using a not exists criteria, such as

select * 
from t
where not exists (
    select * from t t2 
    where t.package like Concat(t2.package,'%')
    and t.package != t2.package
);

CodePudding user response:

A couple of ideas. If you have a unique column, you could use a NOT EXISTS:

SELECT P
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
                  FROM dbo.YourTable e
                  WHERE (YT.P LIKE e.P   '%'
                     OR  e.P LIKE YT.P   '%')
                    AND YT.ID != e.ID);

If not, then you could use a CTE and a windowed COUNT:

WITH CTE AS(
    SELECT YT1.*,
           COUNT(YT1.P) OVER (PARTITION BY YT1.P) AS C
    FROM dbo.YourTable YT1
         JOIN dbo.YourTable YT2 ON YT1.P LIKE YT2.P   '%'
                                OR YT2.P LIKE YT1.P   '%') --Will always JOIN to itself
SELECT P
FROM CTE
WHERE C = 1;

db<>fiddle

  • Related