Home > Net >  Browse a string with its own php function, accent problem
Browse a string with its own php function, accent problem

Time:01-20

I want to use my own function which inspects each character of a string. I'm on UTF8. I don't want an alternative with str_replace or preg_match, but i want to understand why it's not working.

`
function GarderCaractereSimple($chaineIn)
{
//Garde seulement les lettres et les chiffres
//transforme les accents en caractères simple

$TabCarSimple = array('a'=>'a','b'=>'b','c'=>'c','d'=>'d','e'=>'e','f'=>'f','g'=>'g','h'=>'h','i'=>'i','j'=>'j','k'=>'k','l'=>'l','m'=>'m','n'=>'n','o'=>'o','p'=>'p','q'=>'q','r'=>'r','s'=>'s','t'=>'t','u'=>'u','v'=>'v','w'=>'w','x'=>'x','y'=>'y','z'=>'z','A'=>'A','B'=>'B','C'=>'C','D'=>'D','E'=>'E','F'=>'F','G'=>'G','H'=>'H','I'=>'I','J'=>'J','K'=>'K','L'=>'L','M'=>'M','N'=>'N','O'=>'O','P'=>'P','Q'=>'Q','R'=>'R','S'=>'S','T'=>'T','U'=>'U','V'=>'V','W'=>'W','X'=>'X','Y'=>'Y','Z'=>'Z','0'=>'0','1'=>'1','2'=>'2','3'=>'3','4'=>'4','5'=>'5','6'=>'6','7'=>'7','8'=>'8','9'=>'9','é'=>'e','è'=>'e','à'=>'a','ç'=>'c','ù'=>'u','ê'=>'e','ï'=>'i','ë'=>'e','ô'=>'o','ö'=>'o','_'=>'_','-'=>'-');


$length = strlen($chaineIn);
$chaineOut = "";

for($i=0; $i<$length; $i  )
{
    //if(in_array($chaineIn[$i],$TabCarSimple))  //same problem
    if(isset( $TabCarSimple[$chaineIn[$i]] ))
    $chaineOut .= $chaineIn[$i];
}


return $chaineOut;
}

$test ="tést";
echo GarderCaractereSimple($test);
`

I want to find the result test but the result is tst

I try on my server and a virtual php engine online, and I have the same problem.

Why I can not read correctly $TabCarSimple[$chaineIn[$i]] if $chaineIn[$i] match an accented character ?

Thank you

CodePudding user response:

You should be using the native multi-byte string functions when dealing with UTF-8. For example you should replace strlen with mb_strlen. Accented characters are multi-byte UTF-8 characters and strlen will see all 4 bytes and count multiple characters instead of just one when it encounters accented characters.

The mb_ functions require the mbsatring PHP extension to be installed.

As you mentioned accented characters, they're multi-byte characters and strlen does not support multi-byte characters.

Here's a friendly example:

echo strlen('           
  • Related