Home > OS >  Filter exact match with sentence/words using regex in Snowflake
Filter exact match with sentence/words using regex in Snowflake

Time:12-15

I want to filter columns that match a specific word or sentence. For instance, if I chose the word anti for filtering, among the following rows:

this is anti-pattern
antimosquitos products
the word anti is cool

I just want to keep the last one, since it's the only one that matches the entire matching pattern as a whole, and not within another word.

I have tried using RLIKE and I always get a FALSE value but I don't understand why. I expected to have a TRUE value at least in one of these SQL statements:

SELECT RLIKE ('Spray antimosquitos', '\bantimosquitos\b', 'i') as p1;
SELECT RLIKE ('Spray antimosquitos', '\santimosquitos\s', 'i') as p1;
SELECT RLIKE ('Spray antimosquitos', '\santimosquitos', 'i') as p1;
SELECT RLIKE ('Spray antimosquitos', '\santimosquitos', 'i') as p1;

However, this is not the case. I have tried with other forms like:

SELECT RLIKE ('Spray antimosquitos', '\b antimosquitos \b', 'i') as p1
SELECT RLIKE ('Spray antimosquitos', '*antimosquitos*', 'i') as p1
SELECT RLIKE ('Spray antimosquitos', ' antimosquitos ', 'i') as p1

And I don't manage to have a returned value TRUE.

What is wrong with my RLIKE?

CodePudding user response:

You need to

  • Ensure the backslashes are doubled as backslashes are used in string literals to create string escape sequences (like \t for tab, \n for a line feed, etc.)
  • Ensure that the pattern matches the entire string, it is just how RLIKE works. It is usually enabled by adding .* on both ends of the pattern.

So, in your case, you can use

'.*\\bantimosquitos\\b.*'

CodePudding user response:

The RLIKE function returns TRUE only if the regular expression matches the entire string:

SELECT RLIKE('Spray anti mosquitos', 'anti') as p1;  -- Returns FALSE because it doesn't match the whole string

SELECT RLIKE('Spray anti mosquitos', '.* anti .*'); -- Returns TRUE 

If you want a TRUE result on a partial match, you can use REGEXP_INSTR and see if the result is >0. If so, that will return TRUE:

SELECT REGEXP_INSTR('Spray anti mosquitos', 'anti', 1, 1, 0, 'i') > 0  as p1; -- Returns TRUE
  • Related