I'm trying to convert the string "pokémon"
from std::string to std::wstring using
std::wstring wsTmp(str.begin(), str.end());
This works on Windows, but on Linux it returns "pok\xffffffc3\xffffffa9mon"
How can I make it work on Linux?
CodePudding user response:
This worked for me on POSIX.
#include <codecvt>
#include <string>
#include <locale>
int main() {
std::string a = "pokémon";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> cv;
std::wstring wide = cv.from_bytes(a);
return 0;
}
The wstring
holds the correct string at the end.
Important note by @NathanOliver: std::codecvt_utf8_utf16
was deprecated in C 17 and may be removed from the standard in a future version.
CodePudding user response:
The problem you seem to be running into here is that it's treating the é
's two code units as separate code points when converting. There's no good way to do this with the standard library past C 17, as std::wstring_convert
was deprecated without being given a proper replacement. You have several options, none of them great:
- Use the deprecated
std::wstring_convert
and ignore the deprecation warnings and the fact that it may be removed in a future revision of C . - Implement your own widening conversion routine (You could use icu4c's BreakIterator to assist with this).
- Use a heavier library like Boost.Locale to do all the heavy lifting for you.
Also somewhat unrelated, but if you care about consistency across different platforms you should be using std::u16string
or std::u32string
. std::wstring
's character size depends on the size of wchar_t
, which varies between different compilers and platforms.