I'm pretty used to the PCRE regex flavor at this point, and some other alternatives like plain grep
's style. But, I couldn't figure out how to do some basic things in the new built-in SQLite regexes (version 3.37.2)
Attempting to match a whole string with ^
and $
anchors works as expected:
sqlite> SELECT 'pancakes' REGEXP '^pancakes$';
1
I'd expect this next one to work. I always thought |
had "everything to left" and "everything to right" behavior (low "precedence"?):
sqlite> SELECT 'pancakes' REGEXP '^pan|cakes$';
0
That works fine in Python:
>>> re.search(r"^pan|cakes$", "pancakes")
<re.Match object; span=(0, 3), match='pan'>
Maybe I have to escape |
? But doesn't work:
sqlite> SELECT 'pancakes' REGEXP '^pan\|cakes$';
0
Does |
work at all? Yes, it seems to, but maybe not doing what I think:
sqlite> SELECT 'pancakes' REGEXP 'pan|cakes';
1
Maybe the precedence is off, I'll group to be safe. Uh oh:
sqlite> SELECT 'pancakes' REGEXP '(^pan)|(cakes$)';
Error: stepping, unmatched '(' (1)
(escaping the parens also gave the same error)
How can I do grouping, anchoring, |
etc? I wasn't able to find much about the regex style in the docs, just something saying that REGEXP
operator is really using some regex function under the hood.
CodePudding user response:
I found the code for the default regexp implementation: https://sqlite.org/src/file/ext/misc/regexp.c
The syntax documented in a code comment block suggests that the syntax you used should work, but I find it does not. I tested on 3.37.0 and then upgraded to 3.39.3 but it still fails.
I was able to get it to work this way:
sqlite> SELECT 'pancakes' REGEXP '(^pan)|cakes$';
1
It seems in the SQLite regex parser, |
binds tighter than ^
or $
, so without the parentheses, it evaluates as if we had written it this way:
sqlite> SELECT 'pancakes' REGEXP '^(pan|cakes)$';
0
This is obviously not going to match, because it would only match the whole string "pan" or "cakes".
This is not compatible with PCRE semantics.
I also tried to use explicit parentheses to override the precedence of |
:
sqlite> SELECT 'pancakes' REGEXP '(^pan)|(cakes$)';
Runtime error: unmatched '('
This should work, but it hits a bug in SQLite with respect to ^
and $
. Read these threads for the discussion from July 2022: