Home > Net >  Row Sum and Case insensitivity
Row Sum and Case insensitivity

Time:10-19

By having a table with thousands of Villages of course there are duplicates. In order to use them as mapping table it's necessary to mark them accordingly.

Table looks as follows;

id country name notUnique
1234567 Peru Arenal null
1234568 Mexico Arenal null
1234569 Honduras Arenal null
1234570 USA Pasadena null
1234571 Canada Pasadena null
1234572 Philippines Pasadeña null
1234573 Poland Susk Nowy null

now I use my code snippet to fill out the not Unique:

-- generate RowSum:
update
    Mapping.dbo.OsmGlobal
set
    notUnique = 1
from
    Mapping.dbo.OsmGlobal osm
 join   (
        SELECT [name], COUNT([name]) RowSum
        FROM Mapping.dbo.OsmGlobal osm
        Group by [name]
        ) cnt
    on osm.[name] = cnt.[name]
where cnt.RowSum > 1
go

obviously it results in result bellow, as there is no case insensitivity;

id country name notUnique
1234567 Peru Arenal 1
1234568 Mexico Arenal 1
1234569 Honduras Arenal 1
1234570 USA Pasadena 1
1234571 Canada Pasadena 1
1234572 Philippines Pasadeña null
1234573 Poland Susk Nowy null

To include 'Pasadeña' in my result groups of the 'Pasadena' I tried to add a collate Latin1_General_CI_AI. That is the point where I fail. Any help is highly welcome.

CodePudding user response:

I would use COLLATE on the name column, along with exists logic:

UPDATE Mapping.dbo.OsmGlobal o1
SET notUnique = 1
WHERE EXISTS (SELECT 1 FROM Mapping.dbo.OsmGlobal t2
              WHERE t2.name COLLATE Latin1_General_CI_AI =
                    t1.name COLLATE Latin1_General_CI_AI AND
                    t2.id <> t1.id);

In plain English, the above query says to set the notUnique flag to 1 for any record for which we can find another record (different id) having the same name, as collated using Latin1_General_CI_AI.

CodePudding user response:

As mentioned by @TimBiegeleisen, you should use a case-insensitive collation.

But you can do this more simply with an updatable CTE

WITH osm AS (
    SELECT *,
      COUNT(*) OVER (PARTITION BY (name COLLATE Latin1_General_CI_AI)) RowSum
    FROM Mapping.dbo.OsmGlobal osm
)
UPDATE osm
SET
    notUnique = 1
WHERE osm.RowSum > 1;
  • Related