Home > Mobile >  Printing wide character literals in C
Printing wide character literals in C

Time:05-19

I am trying to print unicode to the terminal under linux using the wchar_t type defined in the wchar.h header. I have tried the following:

#include <wchar.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  /*
  char* direct = "\xc2\xb5";

  fprintf(stderr, "%s\n", direct);
  */

  wchar_t* dir_lit = L"μ";
  wchar_t* uni_lit = L"\u03BC";
  wchar_t* hex_lit = L"\xc2\xb5";

  fwprintf(stderr,
           L"direct: %ls, unicode: %ls, hex: %ls\n",
           dir_lit,
           uni_lit,
           hex_lit);

  return 0;
}

and compiled it using gcc -O0 -g -std=c11 -o main main.c. This produces the output direct: m, unicode: m, hex: ?u (based on a terminal with LANG=en_US.UTF-8). In hex:

00000000  64 69 72 65 63 74 3a 20  6d 2c 20 75 6e 69 63 6f  |direct: m, unico|
00000010  64 65 3a 20 6d 2c 20 68  65 78 3a 20 3f 75 0a     |de: m, hex: ?u.|
0000001f

The only way that I have managed to obtain the desired output of μ is via the code commented in above (as a char* consisting of hex digits).

I have also tried to print based on the wcstombs funtion:

void print_wcstombs(wchar_t* str)
{
  char buffer[100];

  wcstombs(buffer, str, sizeof(buffer));

  fprintf(stderr, "%s\n", buffer);
}

If I call for example print_wcstombs(dir_lit), then nothing is printed at all, so this approach does not seem to work at all.

I would be contend with the hex digit solution in principle, however, the length of the string is not calulated correctly (should be one, but is two bytes long), so formatting via printf does not work correctly.

Is there any way to handle / print unicode literals the way I intend using the wchar_t type?

CodePudding user response:

With your program as-is, I compiled and ran it to get

direct: ?, unicode: ?, hex: ?u

I then included <locale.h> and added a setlocale(LC_CTYPE, ""); at the very beginning of the main() function, which, when run using a Unicode locale (LANG=en_US.UTF-8), produces

direct: μ, unicode: μ, hex: µ

(Codepoint 0xC2 is  in Unicode and 0xB5 is µ (U 00B5 MICRO SIGN as oppposed to U 03BC GREEK SMALL LETTER MU); hence the characters seen for the 'hex' output; results might vary if using an environment that does not use Unicode for wide characters).

Basically, to output wide characters you need to set the ctype locale so the stdio system knows how to convert them to the multibyte ones expected by the underlying system.


The updated program:

#include <wchar.h>
#include <stdio.h>
#include <locale.h>

int main(int argc, char *argv[])
{
   setlocale(LC_CTYPE, "");

  wchar_t* dir_lit = L"μ";
  wchar_t* uni_lit = L"\u03BC";
  wchar_t* hex_lit = L"\xc2\xb5";

  fwprintf(stderr,
           L"direct: %ls, unicode: %ls, hex: %ls\n",
           dir_lit,
           uni_lit,
           hex_lit);

  return 0;
}
  • Related