addstr
— code, output:
use Curses;
initscr;
addstr 0, 0, 'Ж 会 र';
addstr 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;
Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd
addstring
— code, output:
use Curses;
initscr;
addstring 0, 0, 'Ж 会 र';
addstring 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
Ð~V ä¼~Z र
Curses 1.43, perl v5.36.0, OS: openbsd
Why is this behavior observed?
Shouldn't it be vice versa, since addstr
is legacy whereas addstring
is meant to be supportive of unicode?
https://metacpan.org/pod/Curses#Wide-Character-Aware-Functions
https://metacpan.org/pod/Curses#Available-Wide-Character-Aware-Functions
Update:
Wider example, with unicode string:
- hardcoded,
- taken from a variable
- passed as a CLI argument
- read from a file via backticks
- read from a file via
open
We need a file with unicode string:
echo -n 'Ж 会 र' > unicode.string.txt
Case 1: addstr
, no additional declarations:
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstr 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstr 1, 0, 'variable : ' . $unicode_string_variable;
addstr 2, 0, 'argv : ' . $unicode_string_argv;
addstr 3, 0, 'backticks: ' . $unicode_string_backticks;
addstr 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstr 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
run:
perl curses-unicode.addstr.pl 'Ж 会 र'
Curses output, all-working unicode:
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd
STDOUT output, all-working unicode:
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Files output, all-working unicode:
cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र
Case 2: addstring
, no additional declarations:
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
run:
perl curses-unicode.addstring.pl 'Ж 会 र'
Curses output, all-broken unicode::
hardcoded: Ð~V ä¼~Z र
variable : Ð~V ä¼~Z र
argv : Ð~V ä¼~Z र
backticks: Ð~V ä¼~Z र
open_pipe: Ð~V ä¼~Z र
Curses 1.43, perl v5.36.0, OS: openbsd
STDOUT output, all-working unicode::
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Files output, all-working unicode:
cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र
Case 3: addstring
, additional declarations use utf8
, -CA
and :encoding(UTF-8)
:
use utf8;
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
run:
perl -CA curses-unicode.addstring.utf8,CA,encodingUTF8.pl 'Ж 会 र'
Curses output, partially-working, partially-broken unicode::
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ð~V ä¼~Z र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd
STDOUT&STDERR output, all-working unicode:
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 12, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 15, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 18, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 24, <$open_pipe_read_handle> line 1.
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 28, <$open_pipe_read_handle> line 1.
hardcoded: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 29, <$open_pipe_read_handle> line 1.
variable : Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 30, <$open_pipe_read_handle> line 1.
argv : Ж 会 र
backticks: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 32, <$open_pipe_read_handle> line 1.
open_pipe: Ж 会 र
Files output, all-working unicode:
cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र
- Why unicode just works for
STDOUT
and writing to files in all 3 cases without any hassle, whereasCurses
balks? What is so special toCurses
? Isn't it a bug of some kind inCurses
given that withSTDOUT
and files all OK? - Is there a single place to enable unicode or need you to specify separately; where is uniformity; why?:
use utf8
for unicode in the source;-CA
for cli arguments;:encoding(UTF-8)
foropen
- How to fix unicode for backticks?
- What are
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line ..., <$open_pipe_read_handle> line 1.
onSTDERR
and how to rid of these?
CodePudding user response:
You need the use utf8;
pragma:
use utf8;
use Curses;
initscr;
addstring 0, 0, 'Ж 会 र';
addstring 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;
Output:
Ж 会 र
Curses 1.43, perl v5.34.0, OS: linux
See the Perl Unicode FAQ. Why the addstr
version does work is probably a matter of luck (on my system only the third character is correctly displayed).
If you want to handle command line arguments from $ARGV
as utf8 then you need a different approach. One way is to call Perl explicitly with the -C
flag set to A
or 32
(this is a special setting that controls the $ARGV
encoding) or equivalently by setting the PERL_UNICODE
environment variable in the terminal to A.
Alternatively you can re-encode $ARGV
from within the code:
use Encode qw(decode_utf8);
@ARGV = map { decode_utf8($_, 1) } @ARGV;
In this case you don't need the command line flag.
This alternative also works for backticks substitution:
use Encode qw(decode_utf8);
my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`, 1);
Source: https://www.perl.com/pub/2012/04/perlunicookbook-decode-argv-as-utf8.html/
There is however a simpler solution that sets utf8 for hardcoded strings, argv, filehandles, printf, backticks etc. simultaneously, which is the utf8::all
module. With this you don't need command line flags or the Encode
module.- Because it targets STDOUT the warnings about wide characters are resolved as well.
use utf8::all;
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
#my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`,1);
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
Source: https://blog.ostermiller.org/perl-wide-character-in-print/
If for whatever reason you can't or don't want to install this module then use utf8;
together with the command line flags -CSDA
also resolves all issues. Note that with these command line flags you should not use decode_utf8()
in your code.