I'm trying to use regular expression to match hashtags. When the language of a hashtag is English or Chinese, my code works fine. But when the language is Bengali, my code can't match the whole Bengali word.
Here is the code I'm testing with:
<?php
$hashtag = '#আয়াতুল_কুরসি';
preg_match('/(#\w )/u', $hashtag, $matches);
print_r($matches);
?>
And the result is:
Array
(
[0] => #আয়
[1] => #আয়
)
I tried changing the pattern to '/(#\p{L} )/u'
, but that didn't help.
CodePudding user response:
The fact is that \w
here does not match all diacritics that Bengali characters may contain. You need to allow them all:
preg_match('/#[\w\p{M}] /u', $hashtag, $matches);
See the PHP demo.