I would like to understand why I can print the character 'à' with the functions fopen and fgetc when I read a .txt file but I can't assign it to a char variable and print it with the printf function.
When I read the file texte.txt, the output is:
(Here is a letter that we often use in French: à)
The letter 'à' is correctly read by the fgetc function and assigned to the char c variable
See the code below:
int main() {
FILE *fp;
fp=fopen("texte.txt", "r");
if (fp==NULL) {
printf("erreur fopen");
return 1;
}
char c = fgetc(fp);
while(c != EOF) {
printf("%c", c);
c = fgetc(fp);
}
printf("\n");
return 0;
}
But now if I try to assign the 'à' character to a char variable, I get an error! See the code below:
int main() {
char myChar = 'à';
printf("myChar is: %c\n", myChar);
return 0;
}
ERROR: ./main.c:26:15: error: character too large for enclosing character literal type char myChar = 'à';
My knowledge in C is very insufficient, and I can't find an answer anywhere
CodePudding user response:
To print à
you can use wide character (or wide string):
#include <wchar.h> // wchar_t
#include <stdio.h>
#include <locale.h> // setlocale LC_ALL
int main() {
setlocale(LC_ALL, "");
wchar_t a = L'à';
printf("%lc\n", a);
}
In short: Characters have encoding. Program "locale" chooses what encoding is used by the standard library functions. A wide character represents a locale-agnostic character, a character that is "able" to be converted to/from any locale. setlocale
set's your program locale to the locale of your terminal. This is needed so that printf
knows how to convert wide character à
to the encoding of your terminal. L
in front of a character or string makes it a wide. On Linux, wide characters are in UTF-32.
Handling encodings might be hard. I can point to: https://en.wikipedia.org/wiki/Character_encoding , https://en.wikipedia.org/wiki/ASCII , https://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html , https://en.cppreference.com/w/cpp/locale , https://en.wikipedia.org/wiki/Unicode .
You can encode a multibyte string straight in your source code and output. This will work only if your compiler generates code for the multibyte string in the same encoding as your terminal works with. If you change your terminal encoding, or tell your compiler to use a different encoding, it may fail. On Linux, UTF-8 is everywhere, compilers generate UTF-8 string and terminals understand UTF-8.
const char *str = "à";
printf("%s\n", str);