I need 2 regex case insensitive patterns. One of them are equivalent of SQL's %
. So %word%
. My attempt at this was '^[a-zA-Z]*word[a-zA-Z]*$'
.
Question 1: This seems to work, but I am wondering if this is the equivalent of %word%
.
Finally the last pattern being similar to %
, but requires 3 or more characters either before and after the word. So for example if the target word was word
:
words
= matched because it doesn't have 3 or more characters either before or after it.swordfish
= not matched because it has 3 or more characters afterword
theword
= not matched because it has 3 or more characters before itmywordly
= matched because it doesn't contain 3 or more characters before or afterword
.miswordeds
= not matched because it has 3 characters before it. (it also has 3 words after it too, but it met the criteria already).
Question 2: For the second regex, I am not very sure how to start this. I will be using the regex in a MySQL query using the REGEXP
function for example:
SELECT 1
WHERE 'SWORDFISH' REGEXP '^[a-zA-Z]*word[a-zA-Z]*$'
CodePudding user response:
First Question:
According to https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_like
With LIKE you can use the following two wildcard characters in the pattern:
- % matches any number of characters, even zero characters.
- _ matches exactly one character.
It means the REGEX ^[a-zA-Z]*word[a-zA-Z]*$'
is not equivalent to %word%
Second Question:
Change * to {0,2} to indicate you want to match at maximum 2 characters either before or after it:
SELECT 1
WHERE 'SWORDFISH' REGEXP '^[a-zA-Z]{0,2}word[a-zA-Z]{0,2}$'
And to make case insensitive:
SELECT 1 WHERE LOWER('SWORDFISH') REGEXP '^[a-z]{0,2}word[a-z]{0,2}$'
CodePudding user response:
Assuming
- The test string (or column) has only letters. (Hence, I can use
.
instead of[a-z]
). - Case folding and accents are not an issue (presumably handled by a suitable
COLLATION
).
Either way:
WHERE x LIKE '%word%' -- found the word
AND x NOT LIKE '%___word%' -- but fail if too many leading chars
AND x NOT LIKE '%word___%' -- or trailing
WHERE x RLIKE '^.{0,2}word.{0,2}$'
I vote for RLIKE being slightly faster than LIKE -- only because there are fewer ANDs
.
(MySQL 8.0 introduced incompatible regexp syntax; I think the syntax above works in all versions of MySQL/MariaDB. Note that I avoided word boundaries and character class shortcuts like \\w
.)