Home > database >  Regular expression can't match the whole Bengali word
Regular expression can't match the whole Bengali word

Time:12-28

I'm trying to use regular expression to match hashtags. When the language of a hashtag is English or Chinese, my code works fine. But when the language is Bengali, my code can't match the whole Bengali word.

Here is the code I'm testing with:

<?php

$hashtag = '#আয়াতুল_কুরসি';

preg_match('/(#\w )/u', $hashtag, $matches);

print_r($matches);

?>

And the result is:

Array
(
    [0] => #আয়
    [1] => #আয়
)

I tried changing the pattern to '/(#\p{L} )/u', but that didn't help.

CodePudding user response:

The fact is that \w here does not match all diacritics that Bengali characters may contain. You need to allow them all:

preg_match('/#[\w\p{M}] /u', $hashtag, $matches);

See the PHP demo.

  • Related