Home > Software design >  perl Curses and unicode: why addstr prints fine whereas addstring prints garbage?
perl Curses and unicode: why addstr prints fine whereas addstring prints garbage?

Time:01-09

addstr — code, output:

use Curses;
initscr;
addstr 0, 0, 'Ж 会 र';
addstr 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;
Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd

addstring — code, output:

use Curses;
initscr;
addstring 0, 0, 'Ж 会 र';
addstring 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
Ð~V ä¼~Z र
Curses 1.43, perl v5.36.0, OS: openbsd

Why is this behavior observed?
Shouldn't it be vice versa, since addstr is legacy whereas addstring is meant to be supportive of unicode?

https://metacpan.org/pod/Curses#Wide-Character-Aware-Functions
https://metacpan.org/pod/Curses#Available-Wide-Character-Aware-Functions


Update:

Wider example, with unicode string:

  • hardcoded,
  • taken from a variable
  • passed as a CLI argument
  • read from a file via backticks
  • read from a file via open

We need a file with unicode string:

echo -n 'Ж 会 र' > unicode.string.txt

Case 1: addstr, no additional declarations:

use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstr 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstr 1, 0, 'variable : ' . $unicode_string_variable;
addstr 2, 0, 'argv     : ' . $unicode_string_argv;
addstr 3, 0, 'backticks: ' . $unicode_string_backticks;
addstr 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstr 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

run:

perl curses-unicode.addstr.pl 'Ж 会 र'

Curses output, all-working unicode:

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd

STDOUT output, all-working unicode:

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र

Files output, all-working unicode:

cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र

Case 2: addstring, no additional declarations:

use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv     : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

run:

perl curses-unicode.addstring.pl 'Ж 会 र'

Curses output, all-broken unicode::

hardcoded: Ð~V ä¼~Z र
variable : Ð~V ä¼~Z र
argv     : Ð~V ä¼~Z र
backticks: Ð~V ä¼~Z र
open_pipe: Ð~V ä¼~Z र
Curses 1.43, perl v5.36.0, OS: openbsd

STDOUT output, all-working unicode::

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र

Files output, all-working unicode:

cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र

Case 3: addstring, additional declarations use utf8, -CA and :encoding(UTF-8):

use utf8;
use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv     : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

run:

perl -CA curses-unicode.addstring.utf8,CA,encodingUTF8.pl 'Ж 会 र'

Curses output, partially-working, partially-broken unicode::

hardcoded: Ж 会 र
variable : Ж 会 र
argv     : Ж 会 र
backticks: Ð~V ä¼~Z र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd

STDOUT&STDERR output, all-working unicode:

Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 12, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 15, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 18, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 24, <$open_pipe_read_handle> line 1.
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 28, <$open_pipe_read_handle> line 1.
hardcoded: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 29, <$open_pipe_read_handle> line 1.
variable : Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 30, <$open_pipe_read_handle> line 1.
argv     : Ж 会 र
backticks: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 32, <$open_pipe_read_handle> line 1.
open_pipe: Ж 会 र

Files output, all-working unicode:

cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र

  • Why unicode just works for STDOUT and writing to files in all 3 cases without any hassle, whereas Curses balks? What is so special to Curses? Isn't it a bug of some kind in Curses given that with STDOUT and files all OK?
  • Is there a single place to enable unicode or need you to specify separately; where is uniformity; why?:
    • use utf8 for unicode in the source;
    • -CA for cli arguments;
    • :encoding(UTF-8) for open
  • How to fix unicode for backticks?
  • What are Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line ..., <$open_pipe_read_handle> line 1. on STDERR and how to rid of these?

CodePudding user response:

You need the use utf8; pragma:

use utf8;
use Curses;
initscr;
addstring 0, 0, 'Ж 会 र';
addstring 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;

Output:

Ж 会 र
Curses 1.43, perl v5.34.0, OS: linux

See the Perl Unicode FAQ. Why the addstr version does work is probably a matter of luck (on my system only the third character is correctly displayed).

If you want to handle command line arguments from $ARGV as utf8 then you need a different approach. One way is to call Perl explicitly with the -C flag set to A or 32 (this is a special setting that controls the $ARGV encoding) or equivalently by setting the PERL_UNICODE environment variable in the terminal to A.

Alternatively you can re-encode $ARGV from within the code:

use Encode qw(decode_utf8);
@ARGV = map { decode_utf8($_, 1) } @ARGV;

In this case you don't need the command line flag.

This alternative also works for backticks substitution:

use Encode qw(decode_utf8);
my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`, 1);

Source: https://www.perl.com/pub/2012/04/perlunicookbook-decode-argv-as-utf8.html/


There is however a simpler solution that sets utf8 for hardcoded strings, argv, filehandles, printf, backticks etc. simultaneously, which is the utf8::all module. With this you don't need command line flags or the Encode module.- Because it targets STDOUT the warnings about wide characters are resolved as well.

use utf8::all;
use Curses;

my $unicode_string_variable  = 'Ж 会 र';
my $unicode_string_argv      = $ARGV[0];
#my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`,1);
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;

# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
  print $hardcoded_handle 'Ж 会 र';
  close $hardcoded_handle;
open my  $variable_handle, '>', 'unicode.string.variable'  || die;
  print  $variable_handle $unicode_string_variable;
  close  $variable_handle;
open my      $argv_handle, '>', 'unicode.string.argv'      || die;
  print      $argv_handle $unicode_string_argv;
  close      $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
  print $backticks_handle $unicode_string_backticks;
  close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
  print $open_pipe_handle $unicode_string_open_pipe;
  close $open_pipe_handle;

# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv     ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;

initscr;

# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv     : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;

addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";

getchar;
endwin;

Source: https://blog.ostermiller.org/perl-wide-character-in-print/

If for whatever reason you can't or don't want to install this module then use utf8; together with the command line flags -CSDA also resolves all issues. Note that with these command line flags you should not use decode_utf8() in your code.

  • Related