utf8 in erlang format becomes \x (backslash x) ascii encoded-CodePudding

I want to print a utf8 list on my Linux terminal.

-module('main').
-export([main/1]).

main(_) ->
  Text = "あいうえお",
  io:format("~ts~n", [Text]),
  halt().

When I compile and run on Ubuntu22.04,

$ erlc main.erl
$ erl -noshell -run main main run
\x{3042}\x{3044}\x{3046}\x{3048}\x{304A}

it shows as \x{3042} instead of あ.

In utf8, "あいうえお" should have 15 Bytes. How can I split \x{3042} into 3 Bytes and print あ?

"あ" is a Japanese character by the way.

list_to_bin didn't work for unicode.

I found unicode:characters_to_list that converts bin to list for unicode. Couldn't find the opposite.

CodePudding user response：

If you want to use Erlang's Unicode output, then remove the -noshell. Adding pc unicode is also good practice.

$ erl  pc unicode -run main main run
Erlang/OTP 24 [erts-12.2.1] [source] [64-bit] ...

あいうえお

In Erlang you can specify a binary as utf8. For example, to see the three bytes binary representation of the Japanese character "あ".

1> <<"あ"/utf8>>.                                                                          
<<227,129,130>>

In your example, to take the first glyph of your string.

1> Text = "あいうえお".                                                                    
[12354,12356,12358,12360,12362]
2> unicode:characters_to_binary(Text, unicode, utf8).                                      
<<227,129,130,227,129,132,227,129,134,227,129,136,227,129,138>>
3> binary:part(unicode:characters_to_binary(Text, unicode, utf8),0,3).                     
<<227,129,130>>
4> io:format("~ts~n",[binary:part(unicode:characters_to_binary(Text, unicode, utf8),0,3)]).
あ

To save unicode to a file, use erlang's file encoding options.

5>  {ok,G} = file:open("/tmp/unicode.txt",[write,{encoding,utf8}]).
{ok,<0.148.0>}
6> io:put_chars(G,Text).  
ok
7> file:close(G).

Then in a shell

$ file /tmp/unicode.txt
/tmp/unicode.txt: Unicode text, UTF-8 text, with no line terminators

$ cat /tmp/unicode.txt 
あいうえお