I am trying to fetch UTF-8 accentuated characters "é" "ê" from mysql and convert them to UCS-2 when sending over SMPP. The data is stored as utf8_general_ci and I perform the following when opening the DB connection:
$dbh->{'mysql_enable_utf8'}=1;
$dbh->do("set NAMES 'utf8'");
If I test the sending part by hard coding the string value with "é" "ê" using data_encoding=8, it goes through perfectly. However if I comment out the first line and just use what comes from the DB, it fails. Also, if I try to send the characters using the DB and setting data_encoding=3, it also works fine, but then the "ê" would not appear, which is also expected. Here is what I use:
$fred = 'éêcole'; <-- If I comment out this line, the SMPP call fails
$fred = decode('utf-8', $fred);
$fred = encode('UCS-2', $fred);
$resp_pdu = $short_smpp->submit_sm(
source_addr_ton => 0x00,
source_addr_npi => 0x01,
source_addr => $didnb,
dest_addr_ton => 0x01,
dest_addr_npi => 0x01,
destination_addr => $number,
data_coding => 0x08,
short_message => $fred
) or do {
Log("ERROR: submit_sm indicated error: " . $resp_pdu->explain_status());
$success = 0;
};
The different values for the data_coding fields are the following: Meaning of "data_coding" field in SMPP
00000000 (0) - usually GSM7
00000011 (3) for standard ISO-8859-1
00001000 (8) for the universal character set -- de facto UTF-16
The SMPP provider's documentation also mentions that special characters should be handled via UCS-2: https://community.sinch.com/t5/SMS-365-enterprise-service/Handling-Special-Characters/ta-p/1137
How should I prepare the data that is coming out of the DB to make this SMPP call work?
I am using Perl v5.10.1
Thanks !
CodePudding user response:
$dbh->{'mysql_enable_utf8'} = 1;
is used to decode the values returned from the database, causing queries to return decoded text (strings of Unicode Code Points). It makes no sense to decode such a string. Go straight to the encode
.
my $s_ucp = "\xE9\xEA\x63\x6F\x6C\x65"; # éêcole
# -or-
use utf8; # Script is encoded using UTF-8.
my $s_ucp = "éêcole";
printf "%vX\n", $s_ucp; # E9.EA.63.6F.6C.65
my $s_ucs2be = encode('UCS-2', $s_ucp);
printf "%vX\n", $s_ucs2be; # 0.E9.0.EA.0.63.0.6F.0.6C.0.65