Why does the size of the char* data type correspond to the computer's word size (4 bytes or 8 b-CodePudding

As far as I know, the pointer data types (char *, etc.) get the word size of the system. If I think of memory as a grid, one field is the size of one word (so 4 bytes for a 32-bit system and 8 bytes for a 64-bit system). So the idea is to give the pointer exactly one field because that's convenient (better performance?). But then I wonder why a simple char gets only one byte. That would be 1/4 of a field. Why is that? And what happens to the remaining 3 bytes of the box?

CodePudding user response：

The correct way to make the conversion between pointer and integers in C is via the type intptr_t. This is the optimal way to keep a pointer into an integer.

Your question is in link with the hardware of the computers. The C language influenced the hardware design.

There is a distinction between data path and control. Data path is the hard-coded part of the hardware and it contains buses of N wires. There are buses for addresses and buses for data. For practical reasons, the control contains instructions to access the data using different sizes, depending on the need. If you need to work with small integers there is no reason to access them from 4 to 4 bytes. They can be aligned more compact.

But yes, there are C compilers that compile a char in 4 bytes (I have never seen any, but they exist).

CodePudding user response：

the pointer data types (char *, etc.) get the word size of the system

Not exactly: they usually have the size of the address space, ie: enough bits to address any data in RAM. Note however that it can be more bits than the word size (the size of a typical CPU register) as was the case in some older systems: 8088, 8086, 80186 and 80286 had 16 bit registers but an address space ranging from 20 to 24 bits, requiring a pair of words to express an address. These systems actually had various compilation modes where pointers could be 16-bit or 32-bit depending on the amount of memory the program could use.

`But then I wonder why a simple char gets only one byte?

A byte is, by definition, the smallest item of memory that can be addressed directly. The C language maps this to the char type. On most systems, this is an octet comprising 8 bits, which happens to be the smallest possible size for a char. The address space is expressed in this unit, even if the data bus is wider. For example 64-bit intel processors typically have a 128-bit data bus, but addresses are still expressed in units of 8-bits. Some specific CPUs such as DSPs (digital signal processors) may not have this capability and can only address 16-bit or even 32-bit words. On these systems, a byte can be 16-bit or even 32-bit wide and the C compiler either uses this width for the char type or emulates a smaller char type in software.

what happens to the remaining 3 bytes of the box?

Nothing special, the box is a pack of 2, 4, 8 or more bytes, each of which can be addressed directly and independently, either as the hardware allows it or through software emulation.