Home > Back-end > C language when processing utf-8 characters, why data type with a char *, instead of unsigned char *
C language when processing utf-8 characters, why data type with a char *, instead of unsigned char *
Time:11-26
I checked the glib - 2.64.4 source code, all utf-8 correlation function, USES the data type is the gchar (note: typedef char gchar),
If only English characters, the utf-8 is fully compatible with ASCII, use char's nature; The problem is, such as China, Japan and South Korea utf-8 characters with the highest sign bit, and then use processing of char, it seems easy to inadvertently because of the sign bit extensions, and hides huge trap?
Ask: why don't they use unsigned char processing utf-8 directly? Just because of lazy, char is easy to use?
CodePudding user response:
I feel, may be because of a few reasons of unsigned,
CodePudding user response:
Character do you think for the possibility of numerical calculation? Even met the need for character encoding digital operation, can do so guess also has certain understanding to the character, will leak under this?
The significance of unsigned completely belongs to the sense of mathematics, If the characters do have mathematics this aspect demand, can undertake choosing according to the concrete operation situation is unsigned,
CodePudding user response:
To look at specific processing code and char of the sign bit is only useful when used as a short integer
CodePudding user response:
A code, a memory handle what kind of people, {int, unsigned int}, {char *, void *,... *}, {int8, char, unsigned char} these really the difference is not big
CodePudding user response:
Upstairs, its meaning is due solely to "open up a piece of memory"