What is forbidden after pointer-casting a big type to a smaller type in C-CodePudding

Say I have a bigger type.

uint32_t big = 0x01234567;

Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?

Is that an undefined behavior to shift the address of (char*)&big to (char*&big) 1, (char*&big) 2, etc.?
Is that an undefined behavior to both shift and edit (char*)&big 1? Like the example below. I think this example should be an undefined behavior because after casting to (char*), we then have limited our eyesight to a char-type pointer, and we ought not access, even change the value outside this scope.

uint32_t big = 0x01234567;
*((char*)&big   1) = 0xff;
printf("x\n\n\n", *((char*)&big 1));
printf("x\n\n\n", big);

(This pass my Visual C compiler. By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?)

I have seen a code like this. And this is what I usually do when I need to achieve similar task. Is this UB or not? Why or why not? What is the standard way to achieve this?

uint8_t catcher[8] = { 0 };
uint64_t big = 0x1234567812345678;
memcpy(catcher, (uint8_t*)&big, sizeof(uint64_t));

CodePudding user response：

Then what can I do for (char*)&big, the pointer interpreted as a char type after casting?

If a char is eight bits, which it is in most modern C implementations, then there are four bytes in the uint32_t big, and you can do arithmetic on the address from (char *) &big 0 to (char *) &big 4. You can also read and write the bytes from (char *) &big 0 to (char *) &big 3, and those will access individual bytes in the representation of big. Although arithmetic is defined to work up to (char *) &big 4, that is only an endpoint. There is no defined byte there, and you should not use that address to read or write anything.

Is that an undefined behavior to shift the address of (char*)&big to (char*&big) 1, (char*&big) 2, etc.?

These are additions, not shifts, and the syntax is (char *) &big 1, not (char*&big) 1. Arithmetic is defined for the offsets from 0 to 4.

Is that an undefined behavior to both shift and edit (char*)&big 1?

It is allowed to read and write the bytes in big using a pointer to char. This is a special rule for character types. Generally, the bytes of an object should not be accessed using an unrelated type. For example, a float object could not be accessed using an int type. However, the character types are special; you may access the bytes of any object using a character type.

However, it is preferable to use unsigned char for this, as it avoids complications with signed values.

I have seen a code like this.

It is allowed to read or write the bytes of an object using memcpy. memcpy is defined to work as if by copying characters.

Note that, while accessing the bytes of an object is defined by the C standard, how bytes represent values is partly implementation-defined. Different C implementations may use different orders for the bytes within an object, and there can be other differences.

By the way, I want to ask a forked question on that why in this example the first printf gives ffffffff? Shouldn't it be ff?

In your C implementation, char is signed and can represent values from −128 to 127. In *((char*)&big 1) = 0xff;, 0xff is 255 and is too big to fit into a char. It is converted to a char value in an implementation-defined way. Your C implementation converts it to −1. (The eight-bit two’s complement representation of −1, bits 11111111, uses the same bits as the binary representation of 255, again bits 11111111.)

Then printf("x\n\n\n", *((char*)&big 1)); passes this value, −1, to printf. Since it is a char, it is promoted to int to be passed to printf. This produces the same value, −1, but it has 32 bits, 11111111111111111111111111111111. Then you are passing an int, but printf expects an unsigned int for x. The behavior of this is not defined by the C standard, but your C implementation reads the 32 bits as if they were an unsigned int. As an unsigned int, the 32 bits 11111111111111111111111111111111 represent the value 4,294,967,295 or 0xffffffff, so that is what printf prints.

You can print the correct value by using printf("hhx\n\n\n", * ((unsigned char *) &big 1));. As an unsigned char, the bits 11111111 represent 255 or 0xff, and converting that to an int produces 255 or 0x000000ff.

CodePudding user response：

For variadic functions (like printf) all arguments undergoes default argument promotion which promotes smaller integer types to int.

This conversion will include sign-extension if the smaller type is signed, so the value keeps its value.

So if char is a signed type (which is implementation defined) with a value of -1 then it will be promoted to the int value -1. Which is what you see.

If you want to print a smaller type you need to first of all cast to the correct type (unsigned char) then use the proper format (like %hhx for printing unsigned char values).