I want to print a utf8 list on my Linux terminal.
-module('main').
-export([main/1]).
main(_) ->
Text = "あいうえお",
io:format("~ts~n", [Text]),
halt().
When I compile and run on Ubuntu22.04,
$ erlc main.erl
$ erl -noshell -run main main run
\x{3042}\x{3044}\x{3046}\x{3048}\x{304A}
it shows as \x{3042} instead of あ.
In utf8, "あいうえお" should have 15 Bytes. How can I split \x{3042} into 3 Bytes and print あ?
"あ" is a Japanese character by the way.
list_to_bin didn't work for unicode.
I found unicode:characters_to_list that converts bin to list for unicode. Couldn't find the opposite.
CodePudding user response:
If you want to use Erlang's Unicode output, then remove the -noshell
. Adding pc unicode is also good practice.
$ erl pc unicode -run main main run
Erlang/OTP 24 [erts-12.2.1] [source] [64-bit] ...
あいうえお
In Erlang you can specify a binary as utf8. For example, to see the three bytes binary representation of the Japanese character "あ".
1> <<"あ"/utf8>>.
<<227,129,130>>
In your example, to take the first glyph of your string.
1> Text = "あいうえお".
[12354,12356,12358,12360,12362]
2> unicode:characters_to_binary(Text, unicode, utf8).
<<227,129,130,227,129,132,227,129,134,227,129,136,227,129,138>>
3> binary:part(unicode:characters_to_binary(Text, unicode, utf8),0,3).
<<227,129,130>>
4> io:format("~ts~n",[binary:part(unicode:characters_to_binary(Text, unicode, utf8),0,3)]).
あ
To save unicode to a file, use erlang's file encoding options.
5> {ok,G} = file:open("/tmp/unicode.txt",[write,{encoding,utf8}]).
{ok,<0.148.0>}
6> io:put_chars(G,Text).
ok
7> file:close(G).
Then in a shell
$ file /tmp/unicode.txt
/tmp/unicode.txt: Unicode text, UTF-8 text, with no line terminators
$ cat /tmp/unicode.txt
あいうえお