I'm trying to find all rows in my table that have Non-ASCII characters (one or more) in a specific column in Snowflake. I've tried to use the query below. Although, it didn't give me the results I was looking for.
SELECT *
FROM table_name
WHERE column_name REGEXP '[^a-zA-Z0-9];
In addition, I have tried this query as well:
SELECT *
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
This gave me error because I can't use USING
in Snowflake.
Just to elaborate a bit more, I'm looking to find rows with characters like Ã, Ä, Ç, Ô, ÿ, etc. To be clear, these are only examples I'm trying to find all rows where there are one or more cases of non-ASCII characters.
CodePudding user response:
If you just want to detect whether or not a column has non-ASCII, you can do it like this:
set MY_STRING1 = 'Hello, world';
set MY_STRING2 = 'Ãbcde';
select regexp_instr($MY_STRING1,'[^[:ascii:]]') > 0 as HAS_NON_ASCII;
select regexp_instr($MY_STRING2,'[^[:ascii:]]') > 0 as HAS_NON_ASCII;
If you want to replace diacritics with the closest ASCII match you can use this approach: