In PHP,
mb_strtolower('İspanyolca');
returns
U 0069 i LATIN SMALL LETTER I
U 0307 ̇ COMBINING DOT ABOVE
U 0073 s LATIN SMALL LETTER S
U 0070 p LATIN SMALL LETTER P
etc.
I need to get rid of the "U 0307 ̇ COMBINING DOT ABOVE";
I tried this:
$TheUrl=mb_strtolower('İspanyolca');
$TheUrl=normalizer_normalize($TheUrl,Normalizer::FORM_C);
The combining dot above persists.
Any help would be appreciated.
CodePudding user response:
You can try a custom function in PHP that performs Unicode normalization and then remove characters that are not part of the basic Latin alphabet. So for example -
function removeDiacritics($str) {
$normalizedStr = Normalizer::normalize($str, Normalizer::FORM_C);
$cleanStr = preg_replace('/[^a-zA-Z]/', '', $normalizedStr);
return $cleanStr;
}
$TheUrl = mb_strtolower('İspanyolca');
$TheUrl = removeDiacritics($TheUrl);
echo $TheUrl;
CodePudding user response:
To handle this case, you can use the strtr
function to replace specific characters in the string like my example below
$TheUrl = 'İspanyolca';
$TheUrl = mb_strtolower($TheUrl, 'UTF-8');
$TheUrl = strtr($TheUrl, array('i̇' => 'i', 'İ' => 'i'));
This will replace the lowercase 'i'
with a dot above and the uppercase 'İ'
with a regular lowercase 'i'
.