I have been trying to print unicode string using printf. I looked through some of the other answers on setting locale and also setting console output on windows using SetConsoleOutputCP. However there is one problem that I couldn't find answer to. Here is the code sample to test.
#include <stdio.h>
#include <locale.h>
#include <windows.h>
int main()
{
setlocale(LC_ALL, "en_US.UTF-8");
SetConsoleOutputCP(CP_UTF8);
printf("È\n"); // <-- This line prints È as expected
printf("\u00C8\n"); // <-- This line does not print anything at all, not even a new line
return 0;
}
As mentioned in comment printf("\u00C8\n");
does not print anything on windows but works perfectly on Linux. I would like know why and how can I have it print the appropriate unicode character.
CodePudding user response:
The UTF-8 encoding of the character \u00C8
(Latin Capital Letter E with Grave) is \xc3\x88
. So with a compiler that uses UTF-8, both "È\n"
and "\u00C8\n"
are equivalent to "\xc3\x88\n"
.
If you use MSVC, then by default "\u00C8\n"
does not produce a UTF-8 string. MSVC uses the default codepage. If you are working in say CP-1252, then the code for the character \u00C8
happens to be \xc8
, and this code is inserted in the string. It will obviously not work in the UTF-8 codepage. Demo.
You need to use /utf-8
compiler switch to cause MSVC generate UTF-8 characters from universal character names. If you do, then "\u00C8\n" and "È\n"
result in exactly the same string (provided your source file is UTF-8 encoded). Demo