I am creating a simple registration form, in which you must type your nickname into textarea. The problem occurs when user uses polish characters (ś,ż,ć etc.). When I try to display the whole string by echo it looks just how it should, but when I try to display only one character then it shows this weird � symbol.
function ech($nam)
{
echo $nam;
echo "</br>";
echo $nam[1];
}
$te = $_POST['sth']; //$te equals "śżć" now
ech($te);
Output:
śżć
�
CodePudding user response:
Using an offset for a character of a string like $nam[1]
actually only returns a single byte, but the characters are multiple bytes. use multibyte-safe string functions like mb_substr($nam, 0, 1)
- php strings are byte arrays (in contrast to, for example, JavaScript, where they are utf16 character arrays), in UTF-8 the string "ś" contains 2 bytes, doing strlen("ś") gives you 2, doing bin2hex("ś") gives you "c59b", and when you do $str[0] you are only fetching the first byte of the 2 bytes that makes up ś, which on it's own happens to mean nothing, hence you get the � when doing $str[0] (fwiw doing echo $str[0].$str[1] would also work because ś happens to be 2 bytes and you'd manually fetch the first 2 bytes)