Home > Mobile >  When do we casually move out to the pointer region is safe in C?
When do we casually move out to the pointer region is safe in C?

Time:10-11

I want to split a big type thing into several little type things. For example, change a uint32_t variable to four char things. Below is my method:

uint32_t big = 0x01234567; // little endian
char* ptr = (char*)&big   1; // points to 0x45
*(ptr 1) = 0xff;
printf("x", big); //01ff4567

Is that a defined behavior? Or is there more proper way? On the contrary, I know that

    int a = 20, b = 30, c = 40;
    int* d = &b;
    *(d 1)=60;  // some compiler will pass; while some will error
    for(int i=0;i<5;i  )
    printf("%d, ",*(d i));

is an undefined behavior, because we access the memory outside the scope of where the pointer points to. If the first case is legal, then I wonder what is the difference? Is that to say, if the program safely owns the whole region, like

  1. declaring a big type variable uint32_t big = 0x01234567 and do the things above, or
  2. like declaring an array (so we sure there are consecutive memory owned by us), for example,
    int apple[4] = {10,20,30,40};
    int* ptr = &apple[1];
    *(ptr   1) = 666; // success to move or edit the value,
                      // even originally ptr is only an int type pointer
    for (int k = 0; k < 15; k  ) {
        printf("%d\n", *(ptr   k));
    }
    

Then accessing the memory outside the original scope of the type that the pointer declared is legal? This is quite confusing.

CodePudding user response:

Lets take the example of

int apple[4] = {10,20,30,40};
int* ptr = &apple[1];

If we now "draw" out the array and the pointer it will look something like this:

 ---------- ---------- ---------- ---------- 
| apple[0] | apple[1] | apple[2] | apple[3] |
 ---------- ---------- ---------- ---------- 
           ^
           |
           ptr

From that it's easy to see that ptr 1 will point to apple[2]. It's still inside the original array and thus not out of bounds and a valid pointer.


Also remember that for any array or pointer ptr and index i, the expression *(ptr i) is exactly equal to ptr[i]. The latter (ptr[i]) is usually easier to read and understand. And less to write as well.

CodePudding user response:

Variables a, b, c and d in your second example do not form an array, so you have no guarantee they will occupy a contiguous area in memory. Compilers are not required to allocate them in any specific order, they even needn't allocate them in memory - for example a variable may be kept in a processor's register, or even discarded completely if unused.

Hence pointer arithmetics is not defined to bring you from one of them to another.

CodePudding user response:

uint32_t big = 0x01234567; // little endian
char* ptr = (char*)&big   1; // points to 0x45
*(ptr 1) = 0xff;
printf("x", big); //01ff4567

Is that a defined behavior?

It is allowed to access the representation of any object via a pointer to a character type. This is a special property of character types, such as char and unsigned char. The method you are using to accomplish that is well defined.

However, many of the details of type representations are left to implementations to decide. Especially notable among these is the order of bytes in various integer types. Your comment // points to 0x45 is thus only one possibility, which you will observe on so-called little-endian machines, such as Intel-based ones. On other machines available today, you might find that ptr initially points to the byte with value 0x23.

Historically, there have been machines that exhibited other byte orders for 32-bit integers, too.


Or is there more proper way?

Accessing an object's representation via a pointer to char is one thing, but using an analogous approach to access it as other types has undefined behavior. If you want to do this in-place as in your example, then you can do it via a pointer to an appropriate union type. For example:

uint32_t big = 0x01234567;
union parts {
    uint32_t as_u32;    // this member is required, even if it goes unreferenced
    uint16_t as_u16[2];
    uint8_t  as_u8[4];
};
union parts *ptr = (union parts *) &big;

ptr->as_u8[3] = 0xba;
ptr->as_u16[0] = 0xfedc;
printf("x", big);  // on a little-endian machine, prints ba23fedc

"Appropriate" means that one of the union members has a type compatible with the type of the object you are trying to access, and others have types representing the views through which you want to access it. Accesses should then explicitly go through the union, as shown.

There may be some disagreement over how "proper" that is, but C allows it, provided that &big is suitably aligned for a union parts *.


The other main alternative is to use memcpy() to copy part of the representation of an object to or from a different object. For example,

uint32_t big = 0x01234567;
uint16_t medium;

memcpy(&medium, sizeof(medium)   (char *) &big, sizeof(medium));
printf("" PRIx16 "\n", medium);  // prints 0123 on a little-endian machine

medium = 0xfedc;
memcpy(&big, &medium, sizeof(medium));
printf("x\n", big); // prints 0123fedc on a little-endian machine
  • Related