Home > Software engineering >  Unicode to CodePoint C
Unicode to CodePoint C

Time:02-03

How can I get the codepoint from a Unicode value? According the character code table, the Code Point for the pictogram '丂' is 8140, and the Unicode is \u4E02

I made this app on C , to try to get the CP for a Unicode string value:

#include <iostream>
#include <atlstr.h>
#include <iomanip>
#include <codecvt>

void hex_print(const std::string& s);

int main()
{
    std::wstring test = L"丂"; //assign pictogram directly
    std::wstring test2 = L"\u4E02"; //assign value via Unicode

    std::wstring_convert<std::codecvt_utf16<wchar_t>> conv1;
    std::string u8str = conv1.to_bytes(test);
    hex_print(u8str);

    std::wstring_convert<std::codecvt_utf16<wchar_t>> conv2;
    std::string u8str2 = conv2.to_bytes(test2);
    hex_print(u8str2);

    return 1;

}

void hex_print(const std::string& s)
{
    std::cout << std::hex << std::setfill('0');
    for (unsigned char c : s)
        std::cout << std::setw(2) << static_cast<int>(c) << ' ';
    std::cout << std::dec << '\n';
}

Output:

00 81 00 40
4e 02

What can I do to get 00 81 00 40, when the value is \u4E02?

CodePudding user response:

In Windows you can use WideCharToMultiByte

int main()
{
    std::wstring test = L"丂"; //assign pictogram directly
    std::wstring test2 = L"\u4E02"; //assign value via Unicode

    std::wstring_convert<std::codecvt_utf16<wchar_t>> conv1;
    std::string u8str = conv1.to_bytes(test);
    hex_print(u8str);

    std::wstring_convert<std::codecvt_utf16<wchar_t>> conv2;
    std::string u8str2 = conv2.to_bytes(test2);
    hex_print(u8str2);

    int len = WideCharToMultiByte(54936, 0, test2.c_str(), -1, NULL, 0, NULL, NULL);
    char* strGB18030 = new char[len   1];
    WideCharToMultiByte(54936, 0, test2.c_str(), -1, strGB18030, len, NULL, NULL);
    hex_print(std::string(strGB18030));
    delete[] strGB18030;

    return 1;

}

output

4e 02
4e 02
81 40
  • Related