We've been pulling our hair out debugging Snowflakes regexp_like/rlike function on a string column. It started silently failing and always returning a False when the string contained a line break (\n).
Example:
select regexp_like('looking for a keyword', '.*keyword.*', 'i')
-> True
select regexp_like('looking for a
keyword', '.*keyword.*', 'i')
-> False
Both expressions, however, should evaluate to True.
Some observations:
- The ilike function doesn't seem to suffer from this issue.
select 'looking for a \n keyword' ilike '%keyword%'
-> True
ilike and regexp_like are differently implemented which would explain why one works but not the other.
- The (programmatic) workaround is to strip all linebreak characters.
select regexp_like(replace('looking for a
keyword','\n',''), '.*keyword.*', 'i')
->True
We obtained the string entries in our table from websites parsed with beautiful soup. While beautifulsoup removes HTML <br>
line breaks, it doesn't seem to remove the line breaks encoded through \n and \r. In retrospective, I see that regexp could be tripped up by a newline character.
Is this expected behavior and specific to Snowflake?
CodePudding user response:
It is possible to provide parameter s
:
select regexp_like('looking for a
keyword', '.*keyword.*', 'is')
-- TRUE
Specifying the Parameters for the Regular Expression:
s - Enables the POSIX wildcard character . to match \n. By default, wildcard character matching is disabled.
The default string is simply c, which specifies:
POSIX wildcard character . does not match \n newline characters.