I have an ASCII string. I like to change its encoding to utf-8. But I found there's a simple function to change ascii to utf-8 in php. and vice verse, I like to change utf-8 alphabet to ascii. Please advise.
I have tried:
<?php
// utf-8
$str = "CHONKIOK";
// I can't even how to print these utf-8 characters in php. I just copied/pasted the string.
// strlen($str) => 24 bytes
// mb_detect_encoding($str) => utf-8
$str2 = "CHONKIOK";
// strlen($str2) => 8 bytes
// mb_detect_encoding($str2) => ascii
// change ascii to utf-8
$str = mb_convert_encoding($str2, "UTF-8");
echo mb_detect_encoding($str);
// returns ascii
CodePudding user response:
What you are doing is correct. As per mb_detect_encoding it states that it detects the most likely character encoding.
As the entire ASCII set is contained within UTF-8 at the exact same character positions, this function is telling you that it's an ASCII string because it technically is. The bytes of this string when encoded in both ASCII and UFT-8 are identical.
As you've found, when you include some characters outside of the ASCII set then it will give you the next probable encoding.
What exactly should I do to obtain this string: "CHONKIOK" from "CHONKIOK"?
The characters you're after are called "Fullwidth Latin" characters.
Given the C
character provided is character 65,315 and a regular C
is character 67, you could possible obtain the strings you're after by adding the difference of 65,248. This is only possible because the alphabet tends to repeat in the same order throughout different parts of the character charts.
You can get the code point of a character using mb_ord and convert it back to a character using mb_chr, after adding 65,248.
That might look something like:
$str_input = "ABC abc 123";
$convertable = "ABCDEFG12349abcdefg";
$str_output = "";
for ($i = 0; $i < strlen($str_input); $i ) {
$char = mb_ord($str_input[$i], "UTF-8");
if(str_contains($convertable, $str_input[$i])) $char = 65248;
$str_output .= mb_chr($char, "UTF-8");
}
echo $str_output; // outputs "ABC abc 123"
Just be sure to include the whole alphabet in $convertable
CodePudding user response:
try this to convert to utf-8:
utf8_encode(string $string): string
try this to convert to ASCII:
utf8_decode(string $string): string