C issue with conversion of std::string to std::wstring

I'm trying to convert the string "pokémon" from std::string to std::wstring using

std::wstring wsTmp(str.begin(), str.end());

This works on Windows, but on Linux it returns "pok\xffffffc3\xffffffa9mon"

How can I make it work on Linux?

CodePudding user response：

This worked for me on POSIX.

#include <codecvt>
#include <string>
#include <locale>

int main() {
    
    std::string a = "pokémon";
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> cv;
    std::wstring wide = cv.from_bytes(a);
    
    return 0;
}

The wstring holds the correct string at the end.

Important note by @NathanOliver: std::codecvt_utf8_utf16 was deprecated in C 17 and may be removed from the standard in a future version.

CodePudding user response：

The problem you seem to be running into here is that it's treating the é's two code units as separate code points when converting. There's no good way to do this with the standard library past C 17, as std::wstring_convert was deprecated without being given a proper replacement. You have several options, none of them great:

Use the deprecated std::wstring_convert and ignore the deprecation warnings and the fact that it may be removed in a future revision of C .
Implement your own widening conversion routine (You could use icu4c's BreakIterator to assist with this).
Use a heavier library like Boost.Locale to do all the heavy lifting for you.

Also somewhat unrelated, but if you care about consistency across different platforms you should be using std::u16string or std::u32string. std::wstring's character size depends on the size of wchar_t, which varies between different compilers and platforms.