Home > database >  Using Hexdump In Bash Script does not correctly handle Unicode Characters
Using Hexdump In Bash Script does not correctly handle Unicode Characters

Time:12-22

Using Hexdump in Bash script to convert Unicode values does not work as expected.

I can enter the following, directly in the terminal:

echo -e "\u2022" | hexdump -C

This yields the following expected result:

00000000 e2 80 a2 0a

However when I attempt to use the exact command in a bash script I get the following:

00000000 5c 75 32 30 32 32 0a |\u2022.|

Any ideas what could be the cause of the differing outputs? Seems that an additional 3 bytes are added to the output and the U value doesn't appear to be converted.

CodePudding user response:

If the same printf command works in the Terminal but not in the script then it means that they're not using the same bash interpreter. Try setting the shebang of the script to:

#!/usr/bin/env bash

That said, here would be a workaround for printing UTF-8 codepoints in the range U 0800-U FFFF (3-bytes) with a standard shell that supports 32-bit arithmetic:

# U 2022
u=0x2022

f=$(printf '\\x%x' \
    "$(( (u & 0xF000 | 0xE0000) >> 12 ))" \
    "$(( (u & 0xFC0  | 0x2000 ) >>  6 ))" \
    "$(( (u & 0x3F   | 0x80   )       ))"
)

printf "$f\\n"
  • Related