Home > Software engineering >  Regex pattern equivalent of %word% in mysql
Regex pattern equivalent of %word% in mysql

Time:07-05

I need 2 regex case insensitive patterns. One of them are equivalent of SQL's %. So %word%. My attempt at this was '^[a-zA-Z]*word[a-zA-Z]*$'.

Question 1: This seems to work, but I am wondering if this is the equivalent of %word%.

Finally the last pattern being similar to %, but requires 3 or more characters either before and after the word. So for example if the target word was word:

  • words = matched because it doesn't have 3 or more characters either before or after it.
  • swordfish = not matched because it has 3 or more characters after word
  • theword = not matched because it has 3 or more characters before it
  • mywordly = matched because it doesn't contain 3 or more characters before or after word.
  • miswordeds = not matched because it has 3 characters before it. (it also has 3 words after it too, but it met the criteria already).

Question 2: For the second regex, I am not very sure how to start this. I will be using the regex in a MySQL query using the REGEXP function for example:

SELECT 1
WHERE 'SWORDFISH' REGEXP '^[a-zA-Z]*word[a-zA-Z]*$'

CodePudding user response:

First Question:

According to https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_like

With LIKE you can use the following two wildcard characters in the pattern:

  • % matches any number of characters, even zero characters.
  • _ matches exactly one character.

It means the REGEX ^[a-zA-Z]*word[a-zA-Z]*$' is not equivalent to %word%

Second Question:

Change * to {0,2} to indicate you want to match at maximum 2 characters either before or after it:

SELECT 1
WHERE 'SWORDFISH' REGEXP '^[a-zA-Z]{0,2}word[a-zA-Z]{0,2}$'

And to make case insensitive:

SELECT 1 WHERE LOWER('SWORDFISH') REGEXP '^[a-z]{0,2}word[a-z]{0,2}$'

CodePudding user response:

Assuming

  • The test string (or column) has only letters. (Hence, I can use . instead of [a-z]).
  • Case folding and accents are not an issue (presumably handled by a suitable COLLATION).

Either way:

WHERE x LIKE        '%word%'     -- found the word
  AND x NOT LIKE '%___word%'     -- but fail if too many leading chars
  AND x NOT LIKE    '%word___%'  -- or trailing

WHERE x RLIKE '^.{0,2}word.{0,2}$'

I vote for RLIKE being slightly faster than LIKE -- only because there are fewer ANDs.

(MySQL 8.0 introduced incompatible regexp syntax; I think the syntax above works in all versions of MySQL/MariaDB. Note that I avoided word boundaries and character class shortcuts like \\w.)

  • Related