Home > Back-end >  C language when processing utf-8 characters, why data type with a char *, instead of unsigned char *
C language when processing utf-8 characters, why data type with a char *, instead of unsigned char *

Time:11-26

I checked the glib - 2.64.4 source code, all utf-8 correlation function, USES the data type is the gchar (note: typedef char gchar),

If only English characters, the utf-8 is fully compatible with ASCII, use char's nature; The problem is, such as China, Japan and South Korea utf-8 characters with the highest sign bit, and then use processing of char, it seems easy to inadvertently because of the sign bit extensions, and hides huge trap?

Ask: why don't they use unsigned char processing utf-8 directly? Just because of lazy, char is easy to use?

CodePudding user response:

I feel, may be because of a few reasons of unsigned,

CodePudding user response:

Character do you think for the possibility of numerical calculation?
Even met the need for character encoding digital operation, can do so guess also has certain understanding to the character, will leak under this?

The significance of unsigned completely belongs to the sense of mathematics,
If the characters do have mathematics this aspect demand, can undertake choosing according to the concrete operation situation is unsigned,

CodePudding user response:

To look at specific processing code and char of the sign bit is only useful when used as a short integer

CodePudding user response:

A code, a memory handle what kind of people, {int, unsigned int}, {char *, void *,... *}, {int8, char, unsigned char} these really the difference is not big

CodePudding user response:

Upstairs, its meaning is due solely to "open up a piece of memory"

CodePudding user response:

refer to the second floor response: thousand dreams life
character do you think for the possibility of digital calculation?
Even met the need for character encoding digital operation, can do so guess also has certain understanding to the character, will leak under this?
The significance of unsigned completely belongs to the sense of mathematics,
If the characters do have mathematics this aspect demand, can undertake choosing according to the concrete operation situation is unsigned,


If considering the characters of encryption, need a byte logic operations: ^ & amp; | ~, estimates that won't light under this conclusion,

CodePudding user response:

Only moves to the right to have a number of symbols, unsigned number have different operation, other did not affect

CodePudding user response:

Because of the byte operations involved, I often encounter the sign bit extensions, make almost all some depressed!
Have to conditioned reflex, hysterically, constantly:
X & amp; 0 XFF.
Seems to be as long as one not careful, it will be the sign bit to man! Ha ha,

CodePudding user response:

Bit operations do not need to consider the sign bit, as well as processing

CodePudding user response:

Character set encoding now also not clear

CodePudding user response:

In C language, the char types may be synonymous with signed char, may also be equivalent to unsigned char, can specify exactly equivalent to char type at compile time, otherwise, you ask Linux what is specified at compile time?
  • Related