I have a bunch of abbreviations I'd like to use in RegEx
matches, but they contain lots of regex reserved characters (like . ? $
).
In Python you're able to return an escaped (regex safe) string using re.escape
. For example:
re.escape("Are U.S. Pythons worth any $$$?")
will return 'Are\\ U\\.S\\.\\ Pythons\\ worth\\ any\\ \\$\\$\\$\\?'
From my (little) experience with Julia so far, I can tell there's probably a much more straightforward way of doing this in Julia, by I couldn't find any previous answers on the topic.
CodePudding user response:
Julia uses the PCRE2 library underneath, and uses its regex-quoting syntax to automatically escape special characters when you join a Regex with a normal String. For eg.
julia> r"\w \s*" * raw"Are U.S. Pythons worth any $$$?"
r"(?:\w \s*)\QAre U.S. Pythons worth any $$$?\E"
Here we've used a raw
string to make sure that none of the characters are interpreted as special, including the $
s.
If we needed interpolation, we can also use a normal String literal instead. In this case, the interpolation will be done, and then the quoting with \Q ... \E
.
julia> snake = "Python"
"Python"
julia> r"\w \s*" * "Are U.S. $snake worth any money?"
r"(?:\w \s*)\QAre U.S. Python worth any money?\E"
So you can place the part of the regex you wish to be quoted in a normal String, and they'll be quoted automatically when you join them up with a Regex.
You can even do it directly within the regex yourself - \Q
starts a region where none of the regex-special characters are interpreted as special, and \E
ends that region. Everything within such a region is treated literally by the regex engine.