Home > Software engineering >  How to use ReadFile with multibyte codes
How to use ReadFile with multibyte codes

Time:09-18

How to use ReadFile to read a buffer as wchar_t array and then output it to the console

DWORD read_output()
{
    BOOL success = FALSE;
    DWORD dwRead;
    HANDLE handle = CreateFileW(L"data.txt",
                                GENERIC_READ,
                                0,
                                NULL,
                                3,
                                FILE_ATTRIBUTE_NORMAL,
                                NULL);
    if (handle == INVALID_HANDLE_VALUE)
        printf("Failed to open file\n");
    do
    {
        wchar_t buffer[128];
        success = ReadFile(handle, buffer, 128, &dwRead, NULL);
        wprintf(L"%s", buffer);
    } while(!success || dwRead == 0);
    return 0;
}


int main()
{
    _setmode(fileno(stdout), _O_U16TEXT);
    read_output();
}

This is the kind of output I get

 ШиÑ
      Ñование.txt  اÙ

what I should get

 Шифрование.txt  العربية.txt

if I remove L"%s" I get this

퀠킨톸톄킀킾킲킰킽킸⺵硴⁴�������⺩硴ੴ

can someone explain in detail how to read multibytes characters using ReadFile

CodePudding user response:

Your first output example indicates the file is encoded in UTF-8, so reading into wchar_t won't work. Need to read into char then use MultiByteToWideChar to convert from UTF-8 to wide characters. The characters read won't be null-terminated, so that needs to be added to the end of string before conversion.

Saving your "what I should get" in data.txt as UTF-8, this works:

#include <windows.h>
#include <fcntl.h>
#include <io.h>
#include <stdio.h>

DWORD read_output()
{
    BOOL success = FALSE;
    DWORD dwRead;
    HANDLE handle = CreateFileW(L"data.txt", 
                                GENERIC_READ, 
                                0, 
                                NULL, 
                                3, 
                                FILE_ATTRIBUTE_NORMAL, 
                                NULL);
    if (handle == INVALID_HANDLE_VALUE)
        printf("Failed to open file\n");

    char buffer[128];
    wchar_t buffer2[128];
    success = ReadFile(handle, buffer, sizeof buffer, &dwRead, NULL);
    buffer[dwRead] = 0;
    MultiByteToWideChar(CP_UTF8, 0, buffer, dwRead, buffer2, _countof(buffer2));
    wprintf(L"%s\n", buffer2);
    return 0;
}


int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    read_output();
}

Output to Windows command prompt as follows, but note the console font needs to contain glyphs for the characters to be displayed properly. A copy/paste to SO shows the correct ones.

actual console display

Шифрование.txt  العربية.txt
  • Related