When getting input from std::cin
in windows, the input is apparently always in the encoding windows-1252 (the default for the host machine in my case) despite all the configurations made, that apparently only affect to the output. Is there a proper way to capture input in windows in UTF-8 encoding?
For instance, let's check out this program:
#include <iostream>
int main(int argc, char* argv[])
{
std::cin.imbue(locale("es_ES.UTF-8"));
std::cout.imbue(locale("es_ES.UTF-8"));
std::cout << "ñeñeñe> ";
std::string in;
std::getline( std::cin, in );
std::cout << in;
}
I've compiled it using visual studio 2022 in a windows machine with spanish locale. The source code is in UTF-8. When executing the resulting program (windows powershell session, after executing chcp 65001
to set the default encoding to UTF-8), I see the following:
PS C:\> .\test_program.exe
ñeñeñe> ñeñeñe
e e e
The first "ñeñeñe" is correct: it display correctly the "ñ" caracter to the output console. So far, so good. The user input is echoed back to the console correctly: another good point. But! when it turns to send back the encoded string to the ouput, the "ñ" caracter is substituted by an empty space.
When debugging this program, I see that the variable "in" have captured the input in an encoding that it is not utf-8: for the "ñ" it use only one character, whereas in utf-8 that caracter must consume two. The conclusion is that the input is not affect for the chcp
command. Is something I doing wrong?
UPDATE
Somebody have asked me to see what happens when changing to wcout/wcin:
std::wcout << u"ñeñeñe> ";
std::wstring in;
std::getline(std::wcin, in);
std::wcout << in;
Behaviour:
PS C:\> .\test.exe
0,000,7FF,6D1,B76,E30ñeñeñe
e e e
Other try (setting the string as L"ñeñeñe"):
ñeñeñe> ñeñeñe
e e e
Leaving it as is:
std::wcout << "ñeñeñe> ";
Result is:
eee>
CodePudding user response:
This is the closest to the solution I've found so far:
int main(int argc, char* argv[])
{
_setmode(_fileno(stdout), _O_WTEXT);
_setmode(_fileno(stdin), _O_WTEXT);
std::wcout << L"ñeñeñe";
std::wstring in;
std::getline(std::wcin, in);
std::wcout << in;
return 0;
}
The solution depicted here went in the right direction. Problem: both stdin and stdout should be in the same configuration, because the echo of the console rewrites the input. The problem is the writing of the string with \uXXXX codes.... I am guessing how to overcome that or using #define
's to overcome and clarify the text literals