I have a file full of \u codes and want to replace them all with corresponding utf8 character, for example "\u00FC" will become "ü":
Here is how far I got:
echo 'f\u00FCr' | perl -C -p -e "s/\\\\(u[0-9A-Fa-f]{4})/ chr(hex(sprintf('0x%s', '00FC'))) /ge"
This will output the expected "für". I just can't figure out how to use the value of the capture group into the sprintf function? $1, $1, \1 and \1 are not working. Guess it will be something very simple, but don't know what to search for. :-)
Or if there is a better approach for this, please let me know, too!
CodePudding user response:
$1
is correct, although you are mistakenly including the u
in the capture.
But you have to be careful about escaping for the shell. You are apparently using sh
or similar (based on your need to escape the \
), so you have to escape certain characters when using double-quotes. That includes $
. Your shell is interpolating $1
before perl
sees it. Best to use single-quotes.
perl -C -pe's/\\u([0-9A-Fa-f]{4})/ chr(hex($1)) /ge'
Note that sprintf('0x%s', '00FC')
is equivalent to '0x' . '00FC'
, but hex
doesn't require the leading 0x
. '00FC'
(and thus $1
) is sufficient.