Home > Blockchain >  Write to console Unicode (UTF-16) text with Windows WinAPI functions?
Write to console Unicode (UTF-16) text with Windows WinAPI functions?

Time:07-09

I have a 64 bit masm code that outputs to a console. The problem is that by using WriteConsoleW, i'm not able to redirect the output of a command or anything since it only writes to the console buffer. But using WriteFile adds spaces between each character since the 16 bit chars have the high-order bits zeroed out. How can i print Unicode text with WriteFile ?

I read Powershell printing output

CodePudding user response:

The same APIs in C produce the same console output. WriteConsoleW performs a character translation to the console that WriteFile doesn't. WriteFile just sends bytes to the console which interprets them in the current code page, which for me is 437 (OEM United States).

I was able to get it to work in C by calling SetConsoleOutputCP(65001) (set console code page to UTF-8) nad then writing a UTF-8 string. Note this list of code page identifiers which includes UTF-16 but it is only available for managed applications (e.g. C#).

I printed some non-ASCII to see if it came out correctly.

// compiled with MSVS "cl /W4 /utf-8 test.cpp"
// source saved in UTF-8 as well.
#include <windows.h>

int main() {
    char s[] = u8"Hello, 马克"; // Note: need a chinese font, but cut/paste
                               // to Notepad and you'll see them if you don't.
    SetConsoleOutputCP(65001);
    auto h = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD written;
    WriteFile(h, s, sizeof(s), &written, nullptr);
}

Output:

Hello, 马克

You should be able to adapt this to MASM easily.

If you are willing to use the C runtime library, then these APIs both work for UTF-16 if you set the console and file mode appropriately:

#include <stdio.h>
#include <io.h>
#include <fcntl.h>

int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    wchar_t s[] = L"Hello, 马克!";
    _write(_fileno(stdout), s, sizeof(s));
    int fd = _open("test.txt", _O_CREAT | _O_WRONLY | _O_U16TEXT);
    _write(fd, s, sizeof(s));
    _close(fd);
}

Output to console:

Hello, 马克!

Output to test.txt encoded in UTF-16LE. Note that 马克 is the two unicode code points U 9A5C and U 514B: hexadecimal dump of test.txt

EDIT

Here's a demo of GetFileType. If run it writes to the console correctly. If redirected to a file, e.g. "test > out.txt", the output file contains UTF-16LE-encoded data.

#include <windows.h>

int main()
{
    auto h = GetStdHandle(STD_OUTPUT_HANDLE);
    auto type = GetFileType(h);
    
    WCHAR s[] = L"Only 20\u20AC!";  // U 20AC is EURO sign.
    DWORD written;
    
    if(type == FILE_TYPE_DISK)
        WriteFile(h, s, sizeof(s) - sizeof(WCHAR) /* don't send the null */, &written, nullptr);
    else
        WriteConsoleW(h, s, sizeof(s) / sizeof(WCHAR) - 1, &written, nullptr);
}

Output to console:

Only 20€!

Output redirected to out.txt: hexadecimal dump of out.txt

  • Related