Does `offsetof(struct Derived, super.x) == offsetof(struct Base, x)` hold true in C?-CodePudding

I am unsure what André Caron means here:

... some of this code relies on (officially) non-standard behavior that "just happens" to work on most compilers. The main issue is that the code assumes that &m.base == &m (e.g. the offset of the base member is 0). If that is not the case, then the cast in custom_bar() results in undefined behavior. To work around this issue, you can add an extra pointer in struct foo as such:

m is of type struct meh *. An object f of type struct foo * is assigned to m through a cast to struct meh *. struct meh has member base of type struct foo (struct foo meh::base = foo::bar). Why it is supposedly not guaranteed that &m.base == &m? I can see this if the structure is not a POD. André also hints at this. However, why is it necessary for a POD structure to have another pointer void *foo::hook?

struct meh * m = (struct meh*)f; becomes struct meh * m = (struct meh*)f->hook;. After he assigns hook to m->base.hook = m;.

struct meh
{
   /* inherit from "class foo". MUST be first. */
   struct foo base;
   int more_data;
};

Below, I listed relevant ISO C90/C 98 excerpts from my research. I also created a code example. The example code can be compiled with Clang via -fsanitize=undefined -std=c 98 -O0 -Wall -Wextra -Wpedantic -Wconversion -Wundef.

Here it is:

https://godbolt.org/z/qo9f8KnYM

Excerpts

From ISO C90 (ANSI C89):

An object shall have its stored value accessed only by an lvalue that has one of the following types: /28/

...

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or

A pointer to a structure object, suitably cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may therefore be unnamed holes within a structure object, but not at its beginning, as necessary to achieve the appropriate alignment.

From ISO C 98:

16 If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD- union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout- compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members. 17 A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. ]

Code example

#include <iostream>

struct A {
  int m1;
};

struct B {
  int m1;
  int m2;
};

struct C {
  struct A super;
  int m3;
};

int main(void) {
  struct A a = {42};
  struct C c = {{666}, 1984};

  // Access A::m1 through pointer of type B
  std::cout << ((B *)&a)->m1 << std::endl; // 42

  // Access A::m1 through pointer of type C
  std::cout << ((C *)&a)->super.m1 << std::endl; // 42

  // Access C::super::A::m1 through pointer of type A.
  std::cout << ((A *)(&c))->m1 << std::endl; // 666

  return 0;
}

Edit 1: Let me rewrite this question in this edit section. I will ignore C , as people in the comments told me to not complicate the question. If this edit is more helpful than the original, then perhaps you can consider replacing the original post with this edit. Or I or someone else can just "strike-through" the original one. Or, if you have a better idea on how to improve my question, please tell me. (I might add that I have issues with attention and get lost in details quite easily... I will leave it at that. You may have guessed what it is...) If my second attempt still fails to deliver, then perhaps I should take my failure to ask a clear question as a hint to think and write it down another time, if applicable. Without further ado, here is my second attempt to pose this question:

I am referring to an answer posted here:

Virtual functions in C

  struct Base {
    int x;
  };

  struct Derived {
    struct Base super;
  };

If offsetof(struct Derived, super) == 0 and offsetof(struct Base, x) == 0, can we then imply that offsetof(struct Derived, super.x) == offsetof(struct Base, x)?

André Caron suggests using an extra pointer pointing to a derived object. Apparently, it is not sufficient or portable to rely on offsetof(struct Derived, super.x) == offsetof(struct Base, x).

Even though this works, you are relying on compiler extensions for type punning that can lead to undefined behavior blablabla. This works in GCC and MSVC for a fact.

Indeed the alignment stuff relies on compiler extensions. You can make it portable using an extra void* pointer in struct foo that points to the "derived object". However, the technique is sufficiently popular in well-known libraries to be considered "portable". Any compiler that made this type of code break would have lots of complaints from its customers.

I have trouble understanding why offsetof(struct Derived, super.x) != offsetof(struct Base, x) could potentially be the case. I have not found clarification in the C standards. Hence, I am looking for further clarification on that.

13:26, restate my assumptions:

Assuming offsetof(struct Derived, super.x) != offsetof(struct Base, x)

  struct Base {
    int x;
    void *hook;
  };

  struct Derived {
    struct Base super;
  };

With the assumption above, consider:

  struct Base base = {42};
  struct Derived derived;
  base.hook = &base; /* Assuming offsetof(struct Base, x) == 0 */
  derived.super = base;

(struct Base*)(derived.super.hook) == &base shall be true.

#include <stddef.h>
#include <stdio.h>

struct Base {
  int x;
  void *hook;
};

struct Derived {
  struct Base super;
};

int main(void) {
  struct Base base = {42};
  struct Derived derived;
  base.hook = &base; /* Assuming offsetof(struct Base, x) == 0 */
  derived.super = base;

  printf("Offset Base x: %lu\n", offsetof(struct Base, x));
  printf("Offset Derived super: %lu\n", offsetof(struct Derived, super));
  printf("Offset Derived super.x: %lu\n", offsetof(struct Derived, super.x));
  printf("Offset Derived super.hook: %lu\n",
         offsetof(struct Derived, super.hook));
  printf("derived.super.hook == &base, yields %d",
         (struct Base *)(derived.super.hook) == &base);

  return 0;
}

CodePudding user response：

However, why is it necessary for a POD structure to have another pointer void *foo::hook?

It isn't necessary. From the original question and answer:

This technique is more reliable, especially if you plan to write the "derived struct" in C and use virtual functions. In that case, the offset of the first member is often non-0 as compilers store run-time type information and the class' v-table there.

A c struct/class with virtual function is not POD. Any non POD structure/class can have a non-0 offset for the data members and that is the case the hook is there to handle.