I've a file with several special characters coded in hexa (the other words are readable). I would like to use sed to convert them using \xHH but I'm not able to do it using regex to match hexa values to translate.
If I manually force the Hexa value it works:
[user@Centos7]$ echo "aaaíaaa" | sed -r 's/&#x([[:xdigit:]] );/\xED/g'
aaaíaaa
But if I try to reuse the match from my regex to translate it to ACSII value using \xHH, it failed => the result is \x the value matched
[user@Centos7]$ echo "aaaíaaa" | sed -r 's/&#x([[:xdigit:]] );/\x\1/g'
aaaxEDaaa
Any clue to help me for this issue? Thanks
CodePudding user response:
You can achieve that with perl
using MHTML::Entities
:
echo 'aaaíaaa' | perl -MHTML::Entities -CS -pe '$_ = decode_entities($_)'
See the online demo.
Here,
- Due to
-CS
Perl allows UTF-8 characters in the STDOUT decode_entities($string)
routine replaces HTML entities found in the$string
with the corresponding Unicode character (nrecognized entities are left as is).