Home > Software design >  Change iconv() replacement character
Change iconv() replacement character

Time:04-29

I am using iconv() to replace characters for an api request like this:

$text = iconv("UTF-8", "ASCII//TRANSLIT", $text);

This works great, however when an unknown character is encountered it gets replaced by a ?. Is there any straightforward way to change the substitution character to something else? Say a space? I know about the setting for mb functions mb_substitute_character() - but this doesn't apply to iconv().

Example:

$text = '? € é î ü π ∑ ñ';
echo iconv("UTF-8", "ASCII//TRANSLIT", $text), PHP_EOL;

Output:

? EUR e i u ? ? n

Desired Output:

? EUR e i u     n

CodePudding user response:

AFAIK there's no translit function that lets you specify your own replacement character, but you can work around it by implementing your own simple escaping.

function my_translit($text, $replace='!', $in_charset='UTF-8', $out_charset='ASCII//TRANSLIT') {
    // escape existing ?
    $res = str_replace(['\\', '?'], ['\\\\', '\\?'], $text);
    // translit
    $res = iconv($in_charset, $out_charset, $res);
    // replace unescaped ?
    $res = preg_replace('/(?<!\\\\)\\?/', $replace, $res);
    // unescape
    return str_replace(['\\\\', '\\?'], ['\\', '?'], $res);
}

$text = '? € é î ü π ∑ ñ \\? \\\\? \\\\\\?';
var_dump(
    $text,
    my_translit($text)
);

Result:

string(36) "? € é î ü π ∑ ñ \? \\? \\\?"
string(29) "? EUR ! ! ! p ! ! \? \\? \\\?"

I'm not certain why the transliteration output is different on my system, but the character replacement works.

  • Related