Home > front end >  MySQL - Regexp for word boundary excluding underscore (connector punctuation)
MySQL - Regexp for word boundary excluding underscore (connector punctuation)

Time:10-05

I'm using regex word boundary \b, and I'm trying to match a word in the following sentence but the result is not what I need. Connector Punctuations (such as underscore) are not being considered as a word boundary

Sentence: ab﹎cd_de_gf|ij|kl|mn|op_

Regexp: \\bkl\\b

However, de is not getting matched.

I tried updating the regexp to use unicode connector punctuation (it's product requirement as we support CJK languages as well) but that isn't working.

Regexp: (?<=\\b|[\p{Pc}])de(?=\\b|[\p{Pc}])

What am i missing here?

Note: (?<=\\b|_)de(?=\\b|_) seems to work for underscores but i need the regex to work for all the connector punctuations.

Thanks in advance !!

CodePudding user response:

Based on the use case you have described you can simplify your regex to:

(?<![[:alnum:]])de(?![[:alnum:]])

instead of trying to match word boundaries, unicode punctuation characters etc.

This will match de if it not followed or preceded by any alpha-numeric character.

CodePudding user response:

To match any connector punctuation characters you need \p{Pc}:

(?<=\\b|\\p{Pc})de(?=\\b|\\p{Pc})

NOTE: \p{Pc} can also be written as [_\u203F\u2040\u2054\uFE33\uFE34\uFE4D-\uFE4F\uFF3F] that matches all these 10 chars.

  • Related