I have a problem and I can't find a solution anywhere, so here I am...
My code (only the interesting part):
char string[21];
scanf(" s",string);
printf("%s", string);
User: 12345678901234567890 ---> 12345678901234567890 (Works fine in that case)
User: éééééééééééééééééééé ---> éééééééééé (The second part is gone because 'é' counts double)
I want the user to enter a name with a max length of 20, so I use scanf(" s",string)
and everything works just fine until you write an accented character... How can I fix that ?
NOTES: (edited) If possible, I want to keep using scanf() and not an other function (but an other function from a standard library (and not an ASCII function lol) is okay).
And most importantly I can't just write scanf("@s",string)
(I'm using if(scanf("%d s %d:%d %c",string)==5)
as a condition).
Thanks.
CodePudding user response:
Utf8 characters have variable size. You can't use s
to ensure 20 characters. Only to ensure 20 bytes (which is what you want the most: avoid memory overflow, and control how many bytes you write).
Note that you mention "ascii", but your strings are not made of ascii chars. Those are UTF-8. There is no such thing as é
in ASCII.
Understand that, roughly, C doesn't really care of ASCII or UTF. It stores bytes it gets from stdin (so, probably from a terminal emulator. Or another program. Point is, it is not C job. Those are bytes coming from somewhere else, without meaning, outside not being 0) in a char *
; and when you printf
it just output those bytes to stdout
(probably a terminal emulator also ; or another program that will read those bytes and decide what to do with it; or, historically, a serial printer; ...)
ASCII devices (and emulator, again, terminal emulator), decide to draw a cicle when they are pushed a 48 (0
), a vertical bar 49, (1
), ...
UTF-8 devices (or emulator) use a more complex protocl. And some bytes mean that they should wait for the next byte, to know what to draw.
I am pretty sure you already know that, but I wanted to describe it in a "down to earth" way, to clarify how C doesn't care. C send 2 bytes (for example). If the terminal emulator (or the physical terminal, or the printer, or the program listening the other side of a pipe, or...) decides to draw one or two symbols from it (depending on whether that receiving end consider those 2 bytes to be ascii or utf, or something else) is up to it. The emitting program (your C code) doesn't care.
Now, all that being said, there are helper functions in C that can be aware of utf8 encoding, and can cound the number of symbols that would be printed by the output device, if it is an utf8 one. Look at mb_strlen
for example (and similar function of the same family).