I want to use my own function which inspects each character of a string. I'm on UTF8. I don't want an alternative with str_replace or preg_match, but i want to understand why it's not working.
`
function GarderCaractereSimple($chaineIn)
{
//Garde seulement les lettres et les chiffres
//transforme les accents en caractères simple
$TabCarSimple = array('a'=>'a','b'=>'b','c'=>'c','d'=>'d','e'=>'e','f'=>'f','g'=>'g','h'=>'h','i'=>'i','j'=>'j','k'=>'k','l'=>'l','m'=>'m','n'=>'n','o'=>'o','p'=>'p','q'=>'q','r'=>'r','s'=>'s','t'=>'t','u'=>'u','v'=>'v','w'=>'w','x'=>'x','y'=>'y','z'=>'z','A'=>'A','B'=>'B','C'=>'C','D'=>'D','E'=>'E','F'=>'F','G'=>'G','H'=>'H','I'=>'I','J'=>'J','K'=>'K','L'=>'L','M'=>'M','N'=>'N','O'=>'O','P'=>'P','Q'=>'Q','R'=>'R','S'=>'S','T'=>'T','U'=>'U','V'=>'V','W'=>'W','X'=>'X','Y'=>'Y','Z'=>'Z','0'=>'0','1'=>'1','2'=>'2','3'=>'3','4'=>'4','5'=>'5','6'=>'6','7'=>'7','8'=>'8','9'=>'9','é'=>'e','è'=>'e','à'=>'a','ç'=>'c','ù'=>'u','ê'=>'e','ï'=>'i','ë'=>'e','ô'=>'o','ö'=>'o','_'=>'_','-'=>'-');
$length = strlen($chaineIn);
$chaineOut = "";
for($i=0; $i<$length; $i )
{
//if(in_array($chaineIn[$i],$TabCarSimple)) //same problem
if(isset( $TabCarSimple[$chaineIn[$i]] ))
$chaineOut .= $chaineIn[$i];
}
return $chaineOut;
}
$test ="tést";
echo GarderCaractereSimple($test);
`
I want to find the result test but the result is tst
I try on my server and a virtual php engine online, and I have the same problem.
Why I can not read correctly $TabCarSimple[$chaineIn[$i]] if $chaineIn[$i] match an accented character ?
Thank you
CodePudding user response:
You should be using the native multi-byte string functions when dealing with UTF-8. For example you should replace strlen
with mb_strlen
. Accented characters are multi-byte UTF-8 characters and strlen
will see all 4 bytes and count multiple characters instead of just one when it encounters accented characters.
The mb_
functions require the mbsatring
PHP extension to be installed.
As you mentioned accented characters, they're multi-byte characters and strlen
does not support multi-byte characters.
Here's a friendly example:
echo strlen('