Consider the following two function, the first one uses the windows api functions ReadFile
and CreateFileW
whereas the second function uses fopen
and fgetws
to read a non english text from a file called data.txt
the first function outputs garbage text where as the second function outputs the text from the data.txt
without any problems
notice that fopen
has ccs=UTF-8
that defines what character encoding to use whereas read_file_2
does not have something similar
DWORD read_file_2()
{
wchar_t wstr[512];
BOOL success = FALSE;
DWORD dwRead, total =0;
HANDLE handle = CreateFileW(L"data.txt",
GENERIC_READ,
0,
NULL,
3,
FILE_ATTRIBUTE_NORMAL,
NULL);
if (handle == INVALID_HANDLE_VALUE)
return -1;
do
{
success = ReadFile(handle, wstr, 20, &dwRead, NULL);
total = dwRead;
} while(!success || dwRead == 0);
wstr[total] = L'\0';
wprintf(L"%ls\n",wstr);
return 0;
}
void read_file_1()
{
wchar_t converted[20];
FILE * ptr;view=msvc-170
ptr = fopen("data.txt", "rt ,ccs=UTF-8");
fgetws(converted, 20, ptr);
wprintf(L"%ls\n", converted);
fclose(ptr);
}
int main()
{
_setmode(fileno(stdin), _O_U8TEXT);
_setmode(fileno(stdout), _O_U8TEXT);
read_file_1();
read_file_2();
}
how does one use ReadFile
to read a wchar_t
from a text file and output it to terminal without turning it into garbage text
Шифрование.txt ال
퀠킨톸톄킀킾킲킰킽킸♥
actual content of data.txt
Шифрование.txt العربية.txt
CodePudding user response:
You can use MultiByteToWideChar.
int total_wchars = MultiByteToWideChar(
CP_UTF8, // CodePage
0, // dwFlags
bytes, // lpMultiByteStr
total_bytes, // cbMultiByte
NULL, // lpWideCharStr
0 // cchWideChar 0 = Get size incl NUL.
);
if ( total_wchars == 0 ) {
// Error. Use GetLastError() and such.
...
}
LPWSTR wchars = malloc( total_wchars * sizeof( *wchars ) );
MultiByteToWideChar(
CP_UTF8, // CodePage
0, // dwFlags
bytes, // lpMultiByteStr
total_bytes, // cbMultiByte
wchars, // lpWideCharStr
total_wchars // cchWideChar
);
Note that if the compiler has wchar_t
,
WCHAR
iswchar_t
LPWSTR
iswchar_t *
LPCWSTR
isconst wchar_t *
CodePudding user response:
The problem is that ReadFile
doesn't read strings or even characters. It reads bytes.
Since it doesn't read strings, it also don't null-terminate the data like a string.
You need to make sure that it reads enough bytes, and to null-terminate the array if it's a string.
By using a loop you have a good start, but your loop overwrites what was read last iteration of the loop, making you loose the data.
You need to pass a pointer to the end of buffer in the loop.
And as I already mentioned in a comment, make sure that the loop works properly (and not go into an infinite loop if there's an error, for example).