Home > Net >  How To Read A File in php wrappers as utf-16
How To Read A File in php wrappers as utf-16

Time:12-13

Is there a way to read a file in a specific character encoding like UTF-16 using PHP's stream wrappers, in the same way I can read a base64-encoded file using php://filter/convert.base64-decode/resource=file.txt?

CodePudding user response:

PHP strings don't know anything about encodings, so PHP file functions essentially treat every file as a binary file.

If you know that a set of bytes should be read as UTF-16, you can convert it to some other encoding of your choice (here using UTF-8 as an example) using any of these (depending which extensions you have installed):

// Requires ext/iconv; arguments are From, To, String
$utf8_string = iconv('UTF-16', 'UTF-8', $utf16_string);
// Requires ext/mbstring; arguments are String, To, From
$utf8_string = mb_convert_encoding($utf16_string, 'UTF-8', 'UTF-16');
// Requires ext/intl; arguments are String, To, From
$utf8_string = UConverter::transcode($utf16_string, 'UTF-8', 'UTF-16');

Conversely, if you know that the string is in some particular encoding (again, using UTF-8 as an example), and want it to be UTF-16, you would put things in the opposite order:

// Requires ext/iconv; arguments are From, To, String
$utf16_string = iconv('UTF-8', 'UTF-16', $utf8_string);
// Requires ext/mbstring; arguments are String, To, From
$utf16_string = mb_convert_encoding($utf8_string, 'UTF-16', 'UTF-8');
// Requires ext/intl; arguments are String, To, From
$utf16_string = UConverter::transcode($utf8_string, 'UTF-16', 'UTF-8');

In both cases, the resulting string is just a different sequence of bytes; other PHP functions still won't "know" what it "means".


The "iconv" extension also provides a conversion filter which runs the equivalent of the iconv function as a file or stream is being read. So if you have a file which you know should be read as UTF-16, and want its contents as UTF-8, you could write:

$fp = fopen('php://filter/convert.iconv.utf-16.utf-8/resource=/path/to/utf16-file.txt', 'r');
$first_10_bytes_of_utf16_converted_to_utf8 = fgets($fp, 10);
fclose($fp);

Or the reverse - a UTF-8 file which you want to read as UTF-16:

$fp = fopen('php://filter/convert.iconv.utf-8.utf-16/resource=/path/to/utf8-file.txt', 'r');
$first_10_bytes_of_utf8_converted_to_utf16 = fgets($fp, 10);
fclose($fp);

Again, it's important to remember that PHP is working in bytes, so the fgets calls above may result in corrupted text because the 10th byte wasn't the end of a Unicode code point.

  • Related