Home > Software engineering >  Convert unicode array to string in bash
Convert unicode array to string in bash

Time:07-05

I have an array of unicode points that i want to be able to convert back into characters and store it inside a variable as a string. In the example below it's just the "Hello World!" code point array but I could have any unicode number (up to 16 bits).

array=( 72 101 108 108 111 32 87 111 114 108 100 33 )

I checked:

and other online resources but I still can't figure out how to do this. I tried things like:

temp=
for c in ${array[@]}; do
        temp ="\U$c"
done
printf %b "$temp"

I also saw bash has a new feature that allows you to do either echo -e '\Uxxxxx' or $'\Uxxx' but in my case it doesn't work since even if i iterate over the array and store each code point inside a variable i, the single quotes would prevent bash from expanding it in this case: echo $'\U$i', i even tried echo "$'\U$i'".

I'm utterly clueless on how to do this with pure bash in a simple way..

CodePudding user response:

The thing that's messing you up is that your array is full of the decimal numbers of the codepoints, but the \U notation takes hexidecimal numbers. So for example, the first element in the array is "72" -- in decimal, that's the code for "H", but in hex it's equivalent to decimal 114, which is the code for "r".

So to use \U notation, you first need to convert the numbers to hex, which you can do with printf %x:

for c in "${array[@]}"; do
    temp ="\\U$(printf %x "$c")"    # Convert dec->hex, add \U
done
printf %b "$temp"    # Convert \U<codepoint> to actual characters

As dave_thompson_085 pointed out in a comment, you can simplify this even further by converting the entire array with a single printf:

printf %b "$(printf '\\U%x' "${array[@]}")"

CodePudding user response:

Shell scripts aren't do-it-all. For complex actions, they often rely on other utility programs that are common in linux installations. In this case, iconv can help.

array=( 72 101 108 108 111 32 87 111 114 108 100 33 )
temp=
for c in ${array[@]}; do temp =$(printf '\\x%x' $c); done
temp=$(echo -ne $temp | iconv -f utf8)
printf %b "$temp"

CodePudding user response:

Why are you calling the array array and then continue with string ??

tmp=""
arr=( 72 101 108 108 111 32 87 111 114 108 100 33 )
for c in "${arr[@]}"; do tmp ="\U$c"; done
printf %b "$tmp"
  • Related