I have an array of unicode points that i want to be able to convert back into characters and store it inside a variable as a string. In the example below it's just the "Hello World!" code point array but I could have any unicode number (up to 16 bits).
array=( 72 101 108 108 111 32 87 111 114 108 100 33 )
I checked:
- How to convert \uXXXX unicode to UTF-8 using console tools in *nix
- How can I convert all UTF8 Unicode characters in a string to their relevant Codepoints using bash/shell/zsh?
and other online resources but I still can't figure out how to do this. I tried things like:
temp=
for c in ${array[@]}; do
temp ="\U$c"
done
printf %b "$temp"
I also saw bash has a new feature that allows you to do either echo -e '\Uxxxxx'
or $'\Uxxx'
but in my case it doesn't work since even if i iterate over the array and store each code point inside a variable i
, the single quotes would prevent bash from expanding it in this case: echo $'\U$i'
, i even tried echo "$'\U$i'"
.
I'm utterly clueless on how to do this with pure bash in a simple way..
CodePudding user response:
The thing that's messing you up is that your array is full of the decimal numbers of the codepoints, but the \U
notation takes hexidecimal numbers. So for example, the first element in the array is "72" -- in decimal, that's the code for "H", but in hex it's equivalent to decimal 114, which is the code for "r".
So to use \U
notation, you first need to convert the numbers to hex, which you can do with printf %x
:
for c in "${array[@]}"; do
temp ="\\U$(printf %x "$c")" # Convert dec->hex, add \U
done
printf %b "$temp" # Convert \U<codepoint> to actual characters
As dave_thompson_085 pointed out in a comment, you can simplify this even further by converting the entire array with a single printf
:
printf %b "$(printf '\\U%x' "${array[@]}")"
CodePudding user response:
Shell scripts aren't do-it-all. For complex actions, they often rely on other utility programs that are common in linux installations. In this case, iconv
can help.
array=( 72 101 108 108 111 32 87 111 114 108 100 33 )
temp=
for c in ${array[@]}; do temp =$(printf '\\x%x' $c); done
temp=$(echo -ne $temp | iconv -f utf8)
printf %b "$temp"
CodePudding user response:
Why are you calling the array array
and then continue with string
??
tmp=""
arr=( 72 101 108 108 111 32 87 111 114 108 100 33 )
for c in "${arr[@]}"; do tmp ="\U$c"; done
printf %b "$tmp"