Home > Enterprise >  PHP: iptcparse() and encoding
PHP: iptcparse() and encoding

Time:10-24

Using the sample image from https://www.iptc.org/std-dev/photometadata/examples/google-licensable/example-page1.html and the following code:

getimagesize("sample.jpg", $image_info);
if (isset($image_info["APP13"])) {
    $iptc = iptcparse($image_info["APP13"]);
    var_dump($iptc);
}

In the browser, the output shows this: � Copyright 2020 IPTC (Test Images) - www.iptc.org

That first character is supposed to be the copyright symbol. How do ensure that the special characters aren't converted into ?

Ultimately, the array needs to be json_encode()d. I believe these characters are causing problems.

UPDATE 1:

Per the suggestion of @6opko to use utf8_encode, I added this to my code:

array_walk_recursive($iptc, function (&$entry) {
    $entry = utf8_encode($entry);
});

This fixed the problem with the copyright symbol. However, in the index ["2#000"][0] of iptcparse result, I'm getting \u0000\u0004. I feel this might have something to do with the IPTC specification that I do not understand yet (and it might be correct, actually). I'm investigating.

UPDATE 2:

Since utf8_encode() is deprecated, I tried adding this to my script:

ini_set('default_charset', 'UTF-8');

That didn't work. I changed the implementation of the array_walk_recursive to use mb_convert_encoding($entry, 'UTF-8') -- and that also didn't work.

CodePudding user response:

So, it is an encoding issue since PHP uses ISO-8859-1 format by default. For newer versions, it is already UTF-8. For older ones, see this thread to change the default settings.

utf8_encode() is a viable option but will get deprecated in the newer PHP versions.

So, it is best to use mb_convert_encoding to convert a string from one character encoding to another.

Snippet:

<?php

getimagesize("sample.jpg", $image_info);
if (isset($image_info["APP13"])) {
    $iptc = iptcparse($image_info["APP13"]);
    echo mb_convert_encoding($iptc['2#116'][0], "UTF-8", "ISO-8859-1");
}
  • Related