I am trying to test a regex on a string with locale letters. Although I have been browsing stackoverflow and googling for quite a while now, I can't seem to figure out why it's not working.
There are 6 locale letters the regex would need to "recognize". Those are čČšŠžŽ.
I have tried various things. I tried just using the "locale words". This didn't work.
var regexList = [/\bžaba/gmi, /\bčešnje/gmi]
regexList[0].test("žaba") //returns false
The next thing that looked promising was adding/merging a "caron" to the basic ascii letters, shown in this example. This was not a valid regex
var regexList = [/\b[zZ]\u02C7aba/gmi, /\b[cC]\u02C7es\u02C7nje/gmi] //invalid regex
While the 1st example works in regex101 it doesn't work in practice. Any help or references would be most welcome.
Other questions that I found useful can be found here and here, but they do not directly relate to my problem.
Thank you
CodePudding user response:
The problem here is that the word boundary (\b
) does not recognize your unicode characters. See here for a workaround: https://stackoverflow.com/a/10590516/10551293.
While besides the \b
your regex works with literal unicode characters, you could use unicode code points: /\u017eaba/
. Code points have the advantage that they clearly state what character is to be matched, which is especially helpful for characters that have a similar appearance or look identical to regular ascii characters. If you are interested, you can convert characters here: https://www.branah.com/unicode-converter