Home > Software engineering >  locale letters not working in regex.test(string)
locale letters not working in regex.test(string)

Time:11-13

I am trying to test a regex on a string with locale letters. Although I have been browsing stackoverflow and googling for quite a while now, I can't seem to figure out why it's not working.

There are 6 locale letters the regex would need to "recognize". Those are čČšŠžŽ.

I have tried various things. I tried just using the "locale words". This didn't work.

var regexList = [/\bžaba/gmi, /\bčešnje/gmi]
regexList[0].test("žaba") //returns false

The next thing that looked promising was adding/merging a "caron" to the basic ascii letters, shown in this example. This was not a valid regex

var regexList = [/\b[zZ]\u02C7aba/gmi, /\b[cC]\u02C7es\u02C7nje/gmi] //invalid regex

While the 1st example works in regex101 it doesn't work in practice. Any help or references would be most welcome.

Other questions that I found useful can be found here and here, but they do not directly relate to my problem.

Thank you

CodePudding user response:

The problem here is that the word boundary (\b) does not recognize your unicode characters. See here for a workaround: https://stackoverflow.com/a/10590516/10551293.

While besides the \byour regex works with literal unicode characters, you could use unicode code points: /\u017eaba/. Code points have the advantage that they clearly state what character is to be matched, which is especially helpful for characters that have a similar appearance or look identical to regular ascii characters. If you are interested, you can convert characters here: https://www.branah.com/unicode-converter

  • Related