Home > Enterprise >  Find all rows that have Non-ASCII in a specific column in SnowFlake
Find all rows that have Non-ASCII in a specific column in SnowFlake

Time:11-04

I'm trying to find all rows in my table that have Non-ASCII characters (one or more) in a specific column in Snowflake. I've tried to use the query below. Although, it didn't give me the results I was looking for.

SELECT *
FROM table_name
WHERE column_name REGEXP '[^a-zA-Z0-9]; 

In addition, I have tried this query as well:

SELECT *
  FROM tableName 
 WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)

This gave me error because I can't use USING in Snowflake.

Just to elaborate a bit more, I'm looking to find rows with characters like Ã, Ä, Ç, Ô, ÿ, etc. To be clear, these are only examples I'm trying to find all rows where there are one or more cases of non-ASCII characters.

CodePudding user response:

If you just want to detect whether or not a column has non-ASCII, you can do it like this:

set MY_STRING1 = 'Hello, world';
set MY_STRING2 = 'Ãbcde';

select regexp_instr($MY_STRING1,'[^[:ascii:]]') > 0 as HAS_NON_ASCII;
select regexp_instr($MY_STRING2,'[^[:ascii:]]') > 0 as HAS_NON_ASCII;

If you want to replace diacritics with the closest ASCII match you can use this approach:

How to replace character accented characters in Snowflake?

  • Related