I want to split a big type thing into several little type things. For example, change a uint32_t
variable to four char
things. Below is my method:
uint32_t big = 0x01234567; // little endian
char* ptr = (char*)&big 1; // points to 0x45
*(ptr 1) = 0xff;
printf("x", big); //01ff4567
Is that a defined behavior? Or is there more proper way? On the contrary, I know that
int a = 20, b = 30, c = 40;
int* d = &b;
*(d 1)=60; // some compiler will pass; while some will error
for(int i=0;i<5;i )
printf("%d, ",*(d i));
is an undefined behavior, because we access the memory outside the scope of where the pointer points to. If the first case is legal, then I wonder what is the difference? Is that to say, if the program safely owns the whole region, like
- declaring a big type variable
uint32_t big = 0x01234567
and do the things above, or - like declaring an array (so we sure there are consecutive memory owned by us), for example,
int apple[4] = {10,20,30,40}; int* ptr = &apple[1]; *(ptr 1) = 666; // success to move or edit the value, // even originally ptr is only an int type pointer for (int k = 0; k < 15; k ) { printf("%d\n", *(ptr k)); }
Then accessing the memory outside the original scope of the type that the pointer declared is legal? This is quite confusing.
CodePudding user response:
Lets take the example of
int apple[4] = {10,20,30,40};
int* ptr = &apple[1];
If we now "draw" out the array and the pointer it will look something like this:
---------- ---------- ---------- ---------- | apple[0] | apple[1] | apple[2] | apple[3] | ---------- ---------- ---------- ---------- ^ | ptr
From that it's easy to see that ptr 1
will point to apple[2]
. It's still inside the original array and thus not out of bounds and a valid pointer.
Also remember that for any array or pointer ptr
and index i
, the expression *(ptr i)
is exactly equal to ptr[i]
. The latter (ptr[i]
) is usually easier to read and understand. And less to write as well.
CodePudding user response:
Variables a
, b
, c
and d
in your second example do not form an array, so you have no guarantee they will occupy a contiguous area in memory. Compilers are not required to allocate them in any specific order, they even needn't allocate them in memory - for example a variable may be kept in a processor's register, or even discarded completely if unused.
Hence pointer arithmetics is not defined to bring you from one of them to another.
CodePudding user response:
uint32_t big = 0x01234567; // little endian char* ptr = (char*)&big 1; // points to 0x45 *(ptr 1) = 0xff; printf("x", big); //01ff4567
Is that a defined behavior?
It is allowed to access the representation of any object via a pointer to a character type. This is a special property of character types, such as char
and unsigned char
. The method you are using to accomplish that is well defined.
However, many of the details of type representations are left to implementations to decide. Especially notable among these is the order of bytes in various integer types. Your comment // points to 0x45
is thus only one possibility, which you will observe on so-called little-endian machines, such as Intel-based ones. On other machines available today, you might find that ptr
initially points to the byte with value 0x23
.
Historically, there have been machines that exhibited other byte orders for 32-bit integers, too.
Or is there more proper way?
Accessing an object's representation via a pointer to char
is one thing, but using an analogous approach to access it as other types has undefined behavior. If you want to do this in-place as in your example, then you can do it via a pointer to an appropriate union type. For example:
uint32_t big = 0x01234567;
union parts {
uint32_t as_u32; // this member is required, even if it goes unreferenced
uint16_t as_u16[2];
uint8_t as_u8[4];
};
union parts *ptr = (union parts *) &big;
ptr->as_u8[3] = 0xba;
ptr->as_u16[0] = 0xfedc;
printf("x", big); // on a little-endian machine, prints ba23fedc
"Appropriate" means that one of the union members has a type compatible with the type of the object you are trying to access, and others have types representing the views through which you want to access it. Accesses should then explicitly go through the union, as shown.
There may be some disagreement over how "proper" that is, but C allows it, provided that &big
is suitably aligned for a union parts *
.
The other main alternative is to use memcpy()
to copy part of the representation of an object to or from a different object. For example,
uint32_t big = 0x01234567;
uint16_t medium;
memcpy(&medium, sizeof(medium) (char *) &big, sizeof(medium));
printf("" PRIx16 "\n", medium); // prints 0123 on a little-endian machine
medium = 0xfedc;
memcpy(&big, &medium, sizeof(medium));
printf("x\n", big); // prints 0123fedc on a little-endian machine