Convert text to value hexadecimal-CodePudding

I'm trying to put the word (for sale) "عربي" in Arabic. But my terminal reverses itself from left to right. Knowing that Arabic is written from right to left. the word is equivalent to "llbye" but the terminal writes "eybll" (ﻊﻴﺒﻠﻟ).

use strict;
use warnings;
use utf8;

binmode( STDOUT, ':utf8' );

use Encode qw< encode decode >;

my $str = 'ﻟﻠﺒﻴﻊ';    # "for sale"
my $enc = encode( 'UTF-8', $str );
my $dec = decode( 'UTF-8', $enc );

my $decoded = pack 'U0W*', map  ord, split //, $enc;

print "Original string : $str\n";     #  ل ل ب ي ع
print "Decoded string 1: $dec\n";      #  ل ل ب ي ع
print "Decoded string 2: $decoded\n"; #  ل ل ب ي ع
my $k = reverse($decoded);
print "Decode  reverse : $k\n";
print "0x$_" for unpack "H*", scalar reverse "$decoded\n";

On line 21, I'm trying to better visualize converting these characters to hexdump, but I receive:

Character in 'H' format wrapped in unpack at line 21.

Term[Perl]:# perl schreib.pl Original string : ﻟﻠﺒﻴﻊ
Decoded string 1: ﻟﻠﺒﻴﻊ
Decoded string 2: ﻟﻠﺒﻴﻊ
Decode reverse : ﻊﻴﺒﻠﻟ

Character in 'H' format wrapped in unpack at line 21.
Character in 'H' format wrapped in unpack at line 21.
Character in 'H' format wrapped in unpack at line 21.
Character in 'H' format wrapped in unpack at line 21.
Character in 'H' format wrapped in unpack at line 21
enter link description here

As in the image, the first blank frame is what I copy and paste, and the terminal inverts without my permission. having to use reverse to print from right to left as in the second frame, as it should have been when pasted.
How do I transform these characters into hexadecimal?

CodePudding user response：

unpack H* expects a string of bytes (characters with value 00..FF), but you have a string of Unicode Code Points (characters with value 000000..10FFFF).

You can use

sprintf "%vX", $str

which is effectively the same as

join ".", map sprintf( "%X", ord( $_ ) ), split //, $str

and

join ".", map sprintf( "%X", $_ ), unpack "W*", $str

All three work for any string (bytes, UCP, whatever).

For $str, $dec and $decoded, the above produces

FEDF.FEE0.FE92.FEF4.FECA

For $enc, the above produces

EF.BB.9F.EF.BB.A0.EF.BA.92.EF.BB.B4.EF.BB.8A

(You may get something different since our files might not be the same.)

With Unicode Code Points, we can use charnames (and/or Unicode::UCD) for more info.

use charnames qw( :full );
use feature qw( say );

for my $cp ( unpack "W*", $str ) {
   my $ch = chr( $ucp );
   if ( $ch =~ /(?[ \p{Print} - \p{Mark} ])/ ) {   # Not sure if good enough.
      printf "‹%s› ", $ch;
   } else {
      print "--- ";
   }

   printf "U %X ", $ucp;

   say charnames::viacode( $ucp );
}

For $str, $dec and $decoded, the above produces

‹ﻟ› U FEDF ARABIC LETTER LAM INITIAL FORM
‹ﻠ› U FEE0 ARABIC LETTER LAM MEDIAL FORM
‹ﺒ› U FE92 ARABIC LETTER BEH MEDIAL FORM
‹ﻴ› U FEF4 ARABIC LETTER YEH MEDIAL FORM
‹ﻊ› U FECA ARABIC LETTER AIN FINAL FORM

Data::Dumper with local $Data::Dumper::Useqq = 1; will produce ASCII output as well.