I define relative pointer to mean what Ginger Bill describes as Self-Relative Pointers:
... define the base [to which an offset will be applied] to be the memory address of the offset itself
For example, consider this struct:
struct house {
int32_t weight;
}
struct person {
int32_t age;
struct house* residence;
}
int32_t getPersonsHousesWeight(struct person* p) {
return p->residence->weight;
}
The relative-pointer implementation of the same thing in C that I think might work is:
struct house { ... } // same as before
struct person {
int32_t age;
int64_t residence; // an offset from the person's address in memory
}
int32_t getPersonsHousesWeight(struct person* p) {
return ((struct residence*)((char*)p (p->residence)))->weight;
}
Assuming that alignment of everything is good (all 8 bytes), is this free of undefined behavior?
EDIT
@tstanisl has provided an excellent answer (which I've accepted) that thoroughly explains UB in the context of stack allocations. I am curious how allocation into a large slab of contiguous heap would impact this analysis. For example:
int foo(void) {
char* base = mmap(NULL,4096,PROT_WRITE | PROT_READ,-1,MAP_PRIVATE | MAP_ANONYMOUS);
// Omitting mmap error checking
struct person* myPerson = (struct person*)(base 128);
struct house* myHouse = (struct house*)(base 256);
int32_t delta = (char*)myHouse - (char*)myPerson;
// Does the computation of delta invoke UB?
}
CodePudding user response:
Usually it is going to be UB.
The first case is when person
and house
belong to separate object.
In such a case it will be UB because the pointer arithmetics is performed outside of the object.
int foo(void) {
struct person p;
struct house h;
p.residence = (char*)&h - (char*)&p; // already UB
getPersonsHousesWeight(&p); // UB again
}
In practice it means that the compiler is not obligated to notice that objects accessed from a pointers constructed from &p
can alias with object h
because p
and h
are separete memory regions (aka objects).
When both objects are placed inside a larger object then the situation is a bit better. Though it still would be technical UB.
int foo(void) {
struct ph {
struct person p;
struct house h;
} ph;
ph.p.residence = (char*)&ph.h - (char*)&ph.p; // still UB
getPersonsHousesWeight(&ph.p); // UB again
}
It UB because pointer arithmetic is done outside the member object.
(char*)&ph.h - 1
is a pointer outside of ph.h
.
Note, that this code will likely work pretty much everywhere.
Otherwise, heavily used container_of
-like macros would not work breaking a lot of existing code including the Linux kernel.
To avoid UB the pointer must be constructed in a special way to avoid moving outside of the originating object.
Rather using &ph.h
one should use (char*)&ph offsetof(struct ph, h)
.
Similarly &ph.p
should be replaced with (char*)&ph offsetof(struct ph, p)
.
Now this code should be portable:
int foo(void) {
struct ph {
struct person p;
struct house h;
} ph;
struct person *p_ptr = (struct person*)((char*)&ph offsetof(struct ph, p));
struct house *h_ptr = (struct house*) ((char*)&ph offsetof(struct ph, h));
ph.p.residence = (char*)h_ptr - (char*)p_ptr;
getPersonsHousesWeight(p_ptr);
}
Though it is very obscure. The interesting discussion on this topic can be found at link