I have a regex:
/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġg̶̃čḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶] /gm
which works great except there is one character I can't include (or that doesn't seem to work as expected when included). The character is (within) the last digit of the regex:
ś̶
// [it makes the cross-through (not easily visible in some fonts), in unicode it is 'COMBINING LONG STROKE OVERLAY' (U 0336)
]
my regex is capturing the character but splitting any word that contains it:
"mokk̇ś̶ḣô".match(/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġčḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶g̶̃] /gm)
// == ['mokk', 'ś̶ḣô']
I've heard about Unicode Property Escapes using \p{UnicodePropertyValue}
with a u
flag. Would that be useful here?
CodePudding user response:
It doesn't seem to be related to ś char. As you said your self, it's being captured. The reason for the splitting is the lack of another char: k̇.
console.log("mokk̇ś̶ḣô".match(/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġčḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶g̶̃] /gm)
)
console.log("mokk̇ś̶ḣô".match(/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġčḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶k̇g̶̃] /gm)
)