Using the sample image from https://www.iptc.org/std-dev/photometadata/examples/google-licensable/example-page1.html and the following code:
getimagesize("sample.jpg", $image_info);
if (isset($image_info["APP13"])) {
$iptc = iptcparse($image_info["APP13"]);
var_dump($iptc);
}
In the browser, the output shows this: � Copyright 2020 IPTC (Test Images) - www.iptc.org
That first character is supposed to be the copyright symbol. How do ensure that the special characters aren't converted into �
?
Ultimately, the array needs to be json_encode()
d. I believe these characters are causing problems.
UPDATE 1:
Per the suggestion of @6opko to use utf8_encode, I added this to my code:
array_walk_recursive($iptc, function (&$entry) {
$entry = utf8_encode($entry);
});
This fixed the problem with the copyright symbol. However, in the index ["2#000"][0]
of iptcparse
result, I'm getting \u0000\u0004
. I feel this might have something to do with the IPTC specification that I do not understand yet (and it might be correct, actually). I'm investigating.
UPDATE 2:
Since utf8_encode()
is deprecated, I tried adding this to my script:
ini_set('default_charset', 'UTF-8');
That didn't work. I changed the implementation of the array_walk_recursive
to use mb_convert_encoding($entry, 'UTF-8')
-- and that also didn't work.
CodePudding user response:
So, it is an encoding issue since PHP uses ISO-8859-1
format by default. For newer versions, it is already UTF-8
. For older ones, see this thread to change the default settings.
utf8_encode()
is a viable option but will get deprecated in the newer PHP versions.
So, it is best to use mb_convert_encoding
to convert a string from one character encoding to another.
Snippet:
<?php
getimagesize("sample.jpg", $image_info);
if (isset($image_info["APP13"])) {
$iptc = iptcparse($image_info["APP13"]);
echo mb_convert_encoding($iptc['2#116'][0], "UTF-8", "ISO-8859-1");
}