Home > Enterprise >  Using UTF-16 for I/O with Visual Studio instead of code pages
Using UTF-16 for I/O with Visual Studio instead of code pages

Time:03-03

I have this working on Visual Studio 2019 using code pages:

#include <windows.h>
#include <iostream>

int main()
{
    UINT oldcp = GetConsoleOutputCP();  
    SetConsoleOutputCP(932);      //932 = Japanese. 
                                  //1200 for little-, 1201 big-, endian UTF-16     

    DWORD used;
    WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE),L"私の犬\n", 4,&used, 0);

    std::cout << "Hit enter to end."; std::cin.get();
    SetConsoleOutputCP(oldcp); 
    return 0;
}

But I am seeing from Microsoft that I should not be using code pages except to interface with legacy code -- use UTF-16 instead. I can find code pages for UTF-16 (little endian or big endian), but using them doesn't work and it's still using code pages.

So what can I use that accomplishes what my program does, but is up-to-date?

CodePudding user response:

Set stdin and stdout to wide mode in Windows and use wcout and wcin with wide strings. You'll need to switch to a console font to support the characters and and IME to type them as well, which can be accomplished by installing the appropriate language support. You're getting that switch automatically by setting a code page, but the characters output correctly even in the "wrong" code page. If you select a font that supports the characters it will work.

#include <iostream>
#include <string>
#include <io.h>
#include <fcntl.h>

int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    _setmode(_fileno(stdin), _O_WTEXT);

    std::wcout << L"私の犬" << std::endl;
    std::wstring a;
    std::wcout << L"Type a string: ";
    std::getline(std::wcin, a);
    std::wcout << a << std::endl;
    getwchar();
}

Output (terminal using code page 437 but NSimSun font):

私の犬
Type a string: 马克
马克

CodePudding user response:

Technically every character encoding is a code page. To use UTF-16 you still have to specify the UTF-16 "code page". But you also need to _setmode first

_setmode(_fileno(stdout), _O_U16TEXT);
std::cout << L"私の犬\n";

But is it up-to-date? No!!! The most reasonable way to print Unicode is to use the UTF-8 code page which will make your app cross-platform and is easier to maintain. See What is the Windows equivalent for en_US.UTF-8 locale? for details on this. Basically just

  • target Windows SDK v17134 or newer, or use static linking to work on older Windows versions
  • change the code page to UTF-8
  • use the -A Win32 APIs instead of -W ones if you're calling those directly (recommended by MS for portability, as everyone else was using UTF-8 for decades)
  • set the /execution-charset:utf-8 and/or /utf-8 flags while compiling
std::setlocale(LC_ALL, ".UTF8");
std::cout << "私の犬\n";

See also Is it possible to set "locale" of a Windows application to UTF-8?

  • Related