for this code -
int main()
{
std::wstring wstr = L"é";
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
std::stringstream ss;
ss << std::hex << std::setfill('0');
for (auto c : myconv.to_bytes(wstr))
{
ss << std::setw(2) << static_cast<unsigned>(c);
}
string ssss = ss.str();
cout << "ssss = " << ssss << endl;
Why does this print ffffffc3ffffffa9 instead of c3a9?
Why does it append ffffff in beginning? If you want to run it in ideone - https://ideone.com/qZtGom
CodePudding user response:
c
is of type char
, which is signed on most systems.
Converting a char
to an unsigned causes value to be sign-extended.
Examples:
- char(0x23) aka 35 --> unsigned(0x00000023)
- char(0x80) aka -128 --> unsigned(0xFFFFFF80)
- char(0xC3) aka -61 --> unsigned(0xFFFFFFc3)
[edit: My first suggestion didn't work; removed]
You can cast it twice:
ss << std::setw(2) << static_cast<int>(static_cast<unsigned char>(c));
The first cast gives you an unsigned type with the same bit pattern, and since unsigned char
is the same size as char
, there is no sign extension.
But if you just output static_cast<unsigned char>(c)
, the stream will treat it as a character, and print .. something .. depending on your locale, etc.
The second cast gives you an int, which the stream will output correctly.