How can I get the codepoint from a Unicode value? According the character code table, the Code Point for the pictogram '丂' is 8140, and the Unicode is \u4E02
I made this app on C , to try to get the CP for a Unicode string value:
#include <iostream>
#include <atlstr.h>
#include <iomanip>
#include <codecvt>
void hex_print(const std::string& s);
int main()
{
std::wstring test = L"丂"; //assign pictogram directly
std::wstring test2 = L"\u4E02"; //assign value via Unicode
std::wstring_convert<std::codecvt_utf16<wchar_t>> conv1;
std::string u8str = conv1.to_bytes(test);
hex_print(u8str);
std::wstring_convert<std::codecvt_utf16<wchar_t>> conv2;
std::string u8str2 = conv2.to_bytes(test2);
hex_print(u8str2);
return 1;
}
void hex_print(const std::string& s)
{
std::cout << std::hex << std::setfill('0');
for (unsigned char c : s)
std::cout << std::setw(2) << static_cast<int>(c) << ' ';
std::cout << std::dec << '\n';
}
Output:
00 81 00 40
4e 02
What can I do to get 00 81 00 40, when the value is \u4E02?
CodePudding user response:
In Windows you can use WideCharToMultiByte
int main()
{
std::wstring test = L"丂"; //assign pictogram directly
std::wstring test2 = L"\u4E02"; //assign value via Unicode
std::wstring_convert<std::codecvt_utf16<wchar_t>> conv1;
std::string u8str = conv1.to_bytes(test);
hex_print(u8str);
std::wstring_convert<std::codecvt_utf16<wchar_t>> conv2;
std::string u8str2 = conv2.to_bytes(test2);
hex_print(u8str2);
int len = WideCharToMultiByte(54936, 0, test2.c_str(), -1, NULL, 0, NULL, NULL);
char* strGB18030 = new char[len 1];
WideCharToMultiByte(54936, 0, test2.c_str(), -1, strGB18030, len, NULL, NULL);
hex_print(std::string(strGB18030));
delete[] strGB18030;
return 1;
}
output
4e 02
4e 02
81 40