Home > Software engineering >  How to convert Hex characters to ASCII using sed & regex
How to convert Hex characters to ASCII using sed & regex

Time:10-12

I've a file with several special characters coded in hexa (the other words are readable). I would like to use sed to convert them using \xHH but I'm not able to do it using regex to match hexa values to translate.

If I manually force the Hexa value it works:

[user@Centos7]$ echo "aaaíaaa" | sed -r 's/&#x([[:xdigit:]] );/\xED/g'
aaaíaaa

But if I try to reuse the match from my regex to translate it to ACSII value using \xHH, it failed => the result is \x the value matched

[user@Centos7]$ echo "aaaíaaa" | sed -r 's/&#x([[:xdigit:]] );/\x\1/g'
aaaxEDaaa

Any clue to help me for this issue? Thanks

CodePudding user response:

You can achieve that with perl using MHTML::Entities:

echo 'aaaíaaa' | perl -MHTML::Entities -CS -pe '$_ = decode_entities($_)'

See the online demo.

Here,

  • Due to -CS Perl allows UTF-8 characters in the STDOUT
  • decode_entities($string) routine replaces HTML entities found in the $string with the corresponding Unicode character (nrecognized entities are left as is).
  • Related