Home > Back-end >  PHP - readfile on accented filenames
PHP - readfile on accented filenames

Time:06-16

I am trying to display images whose names contain accents.

example of a filename:

Nestlé-Coffee-Mate-Original-Lite-311g.jpg

Originally I was displaying my images by accessing them directly from the public folder

<img src="[sitename]/images/products/Nestlé-Coffee-Mate-Original-Lite-311g.jpg"/>

But only images without accents was displaying and I had to switch to fetching the images via my backend (PHP)

Now I have got something like:

<img src="[sitename]/api/image/products/Nestlé-Coffee-Mate-Original-Lite-311g.jpg"/>

PHP code:

public function image($folder, $name) {
    header("Content-type: image");
    return readfile(FCPATH . 'images/' . $folder . '/' . $name);
}
  1. Is there any way I can access files with accented names directly from the public folder without having to fetch via PHP because this slows the displaying of images a bit.
  2. And if I imperatively have to fetch via PHP because of the accents, what is the proper way to do it?

I have tried using encodeURI on the filename which resulted in something like:

<img src="[sitename]/api/image/products/Nestlé-Coffee-Mate-Original-Lite-311g.jpg" />

and in PHP:

return readfile(FCPATH . 'images/' . $folder . '/' .rawurldecode($name));
//or
return readfile(FCPATH . 'images/' . $folder . '/' .urldecode($name));
//or
return readfile(FCPATH . 'images/' . $folder . '/' .utf8_decode($name));

but none of the above worked.

I have also tried various solutions from StackOverflow but none worked.

CodePudding user response:

There are two different representations of common accented characters in UTF-8, composed, and decomposed.

  • The composed representation uses a single codepoint, é \u00e9 \xC3\xA9
  • The decomposed representation uses two codepoints, the base glyph e \u0065 \x65 and the combining mark for the accent \u0301 \xCC\x81

What you have posted in your question uses the latter, decomposed form. You may, in the short term, get your request to work by supplying the comosed form of that character.

That said, when accepting filenames with UTF-8 you should make a point of normalizing those names before writing them to disk, and/or creating links, to avoid this and other problems.

To normalize to Composed and Decomposed forms:

$input = "e\xCC\x81";
$norm_c = Normalizer::normalize($input, Normalizer::FORM_C);
$norm_d = Normalizer::normalize($input, Normalizer::FORM_D);

var_dump(
    $input, bin2hex($input),
    $norm_c, bin2hex($norm_c),
    $norm_d, bin2hex($norm_d)
);

Output:

string(3) "é"
string(6) "65cc81"
string(2) "é"
string(4) "c3a9"
string(3) "é"
string(6) "65cc81"

You should also check how your OS and filesystem handle UTF-8 filenames. Some will simply write any series of bytes as the filename, some will be picky and reject, and others will perform their own normalization that may not neccessarily match what you have chosen in your app.

  • Related