I've file file1.pl
:
use strict;
use warnings;
use Encode;
my @flist = `svn diff --summarize ...`;
foreach my $file (@flist) {
my $foo = "$one/$file";
use bytes;
print(bytes::length($one)."\n");
print(bytes::length($file)."\n");
print(bytes::length($foo)."\n");
}
# 76
# 31
# 108
and file2.pl
with the same main logic. But in file2.pl
the output is:
# 76
# 31
# 110 <-- ?
Both files have the same encoding (ISO-8859-1). For the same result as in file1.pl
I've to use
my $foo = "$one/".decode('UTF-8', $file);
in file2.pl
. What could be the reason for that difference or the requirement of decode('UTF-8', $file)
in file2.pl
? Seems to be related to What if I don't decode? but in which manner and only in file2.pl
? Thx.
Perl v5.10.1
CodePudding user response:
Don't use bytes.
Use of this module for anything other than debugging purposes is strongly discouraged.
bytes::length
gets the length of the internal storage of a string. It's useless.
What could be the reason for that difference
$one
and $file
contained strings stored using different internal storage formats. One needed to be converted for a concatenation to occur.
use strict;
use warnings;
use feature qw( say );
use bytes qw( );
use Encode qw( encode );
sub dump_lengths {
my $s = shift;
say
join " ",
length( $s ),
length( encode( "UTF-8", $s ) ),
bytes::length( $s );
}
# ------ Length of string
my $x = chr( 0xE9 ); # | ---- Length of its UTF-8 encoding
my $y = chr( 0x2660 ); # | | -- Length of internal storage
# | | |
dump_lengths( $x ); # 1 2 1
dump_lengths( $y ); # 1 3 3
my $z = $x . $y;
dump_lengths( $z ); # 2 5 5