I'm trying to decode a text that contains extended ASCII characters but when I try to convert the character I get the wrong value. Like this:
echo "“<br>";
echo ord("“")."<br>";
echo chr(ord("“"))."<br>";
And this is my output:
“
226
�
The ASCII value of the character "“" is 147, not 226. And instead of the � symbol, I want to get "“" character back.
I'm using UTF-8
<meta charset="utf-8">
I have tried changing to different charsets but it didn't work.
CodePudding user response:
1st U 201C
Left Double Quotation Mark is UTF-8 byte sequence E2 80 9C
(hexadecimal) i.e. decimal 226 128 156
2nd ord
— Convert the first byte of a string to a value between 0 and 255
Result: ord("“")
returns 226
…
Instead of ord
and chr
pair, use mb_ord
and its complement mb_chr
, e.g. as follows:
<?php
echo "“<br>";
echo mb_ord("“")."<br>";
echo mb_chr(mb_ord("“"))."<br>";
?>
Result: .\SO\74045685.php
“
8220
“
Edit you can get Windows-1251 code (147
) for character “
(U 201C, Left Double Quotation Mark) as follows:
echo ord(mb_convert_encoding("“","Windows-1251","UTF-8")); //147
CodePudding user response:
You're incorrect about the “
character, the UTF-8 encoding is two bytes: c293
.
See: SET TRANSMIT STATE.
In the manual for ord() it says:
However, note that this function is not aware of any string encoding, and in particular will never identify a Unicode code point in a multi-byte encoding such as UTF-8 or UTF-16.
On top of this, if I actually convert the '“'
charachter to hexadecimal, I get: e2809c
. So it's a triplet. Never trust what you read online.