Home > Back-end >  How to print unicode codepoint using printf
How to print unicode codepoint using printf

Time:12-16

I have been trying to print unicode string using printf. I looked through some of the other answers on setting locale and also setting console output on windows using SetConsoleOutputCP. However there is one problem that I couldn't find answer to. Here is the code sample to test.

#include <stdio.h>
#include <locale.h>
#include <windows.h>

int main()
{
    setlocale(LC_ALL, "en_US.UTF-8");
    SetConsoleOutputCP(CP_UTF8);

    printf("È\n");          // <-- This line prints È as expected
    printf("\u00C8\n");     // <-- This line does not print anything at all, not even a new line

    return 0;
}

As mentioned in comment printf("\u00C8\n"); does not print anything on windows but works perfectly on Linux. I would like know why and how can I have it print the appropriate unicode character.

CodePudding user response:

The UTF-8 encoding of the character \u00C8 (Latin Capital Letter E with Grave) is \xc3\x88. So with a compiler that uses UTF-8, both "È\n" and "\u00C8\n" are equivalent to "\xc3\x88\n".

If you use MSVC, then by default "\u00C8\n" does not produce a UTF-8 string. MSVC uses the default codepage. If you are working in say CP-1252, then the code for the character \u00C8 happens to be \xc8, and this code is inserted in the string. It will obviously not work in the UTF-8 codepage. Demo.

You need to use /utf-8 compiler switch to cause MSVC generate UTF-8 characters from universal character names. If you do, then "\u00C8\n" and "È\n" result in exactly the same string (provided your source file is UTF-8 encoded). Demo

  • Related