Pointer Cast in C-CodePudding

I have the following three structs.

struct A {
  int a;
  int bOffset;
  int cOffset;
};

struct B {
  long long b;
  int other[];
};

struct C {
  long long c;
  int other[];
};

The main function is like:

int main(void) {
  int otherSize = 0;
  scanf("%d", &otherSize);

  int aSize = sizeof(struct A);
  int bSize = sizeof(struct B)   sizeof(int) * otherSize;
  int cSize = sizeof(struct C)   sizeof(int) * otherSize;
  int totalSize = aSize   bSize   cSize;

  struct A *a = malloc(totalSize);
  a->bOffset = aSize;
  a->cOffset = aSize   bSize;

  struct B *b = (struct B*)((char*)a   a->bOffset);
  struct C *c = (struct C*)((char*)a   a->cOffset);

  ......
}

The space of struct A, struct B and struct C is allocated together to show better cache behavior. My question is, according to previous posts on SO, the cast

  struct B *b = (struct B*)((char*)a   a->bOffset);
  struct C *c = (struct C*)((char*)a   a->cOffset);

is an undefined behavior in C because struct B and struct C have stricter alignment requirement than struct A. Then what can I do so that the cast is well-defined in C?

What I can come up with now is to add a long long variable to struct A as follow.

struct A {
  int a;
  int bOffset;
  int cOffset;
  long long unused;
};

Another question is if I dereference b or c, it is also an UB. Is there any way to address this issue?

CodePudding user response：

Then what can I do so that the cast is well-defined in C?

To calculate correctly where struct B and struct C should be placed, you should pad the previous sizes to the necessary alignment. C provides the _Alignof operator to provide the alignment requirement of a type. So this code will do the job:

/*  Calculate how many bytes are required to add to size s to make it be a
    multiple of alignment a.  If s is a multiple of a, this is zero.
    Otherwise, we need to add a-r bytes, where r is the remainder of s divided
    by a.

    Omitting the parentheses used for macro parameters, the following code is
    a - (s-1)%a - 1.  To see it works, consider two cases:


        s is a multiple of a.  Then s-1 is a-1 modulo a, and the expression
        evaluates to a - (a-1) - 1 = 0.

        s has some non-zero remainder r modulo a.  Then (s-1)%a evaluates to
        r-1, and the expression evaluates to a - (r-1) - 1 = a-r.
*/
#define PadToAlignment(s, a)    ((a) - ((s)-1) % (a) - 1)

…

    //  Add padding needed to align struct B and struct C correctly.
    aSize  = PadToAlignment(aSize,         _Alignof (struct B));
    bSize  = PadToAlignment(aSize   bSize, _Alignof (struct C));

Notes

You should generally use size_t rather than int for sizes. Also, when using types with sizeof and _Alignof, I prefer not to write them like function calls, as in sizeof(int), because they are not function calls. Rather, they are operators with an operand that is a typename enclosed in parentheses for grammatical reasons, so sizeof (int) helps remind readers of the meaning of the C code.

Here is complete program incorporating these:

#include <stdio.h>
#include <stdlib.h>


/*  Calculate how many bytes are required to add to size s to make it be a
    multiple of alignment a.  If s is a multiple of a, this is zero.
    Otherwise, we need to add a-r bytes, where r is the remainder of s divided
    by a.

    Omitting the parentheses used for macro parameters, the following code is
    a - (s-1)%a - 1.  To see it works, consider two cases:


        s is a multiple of a.  Then s-1 is a-1 modulo a, and the expression
        evaluates to a - (a-1) - 1 = 0.

        s has some non-zero remainder r modulo a.  Then (s-1)%a evaluates to
        r-1, and the expression evaluates to a - (r-1) - 1 = a-r.
*/
#define PadToAlignment(s, a)    ((a) - ((s)-1) % (a) - 1)


struct A {
  int a;
  int bOffset;
  int cOffset;
};

struct B {
  long long b;
  int other[];
};

struct C {
  long long c;
  int other[];
};


int main(void)
{
    int otherSize = 0;
    if (1 != scanf("%d", &otherSize))
    {
        fprintf(stderr, "Error, scanf failed.\n");
        exit(EXIT_FAILURE);
    }

    size_t aSize = sizeof (struct A);
    size_t bSize = sizeof (struct B)   sizeof (int) * otherSize;
    size_t cSize = sizeof (struct C)   sizeof (int) * otherSize;

    //  Add padding needed to align struct B and struct C correctly.
    aSize  = PadToAlignment(aSize,         _Alignof (struct B));
    bSize  = PadToAlignment(aSize   bSize, _Alignof (struct C));

    size_t totalSize = aSize   bSize   cSize;

    unsigned char *RawMemory = malloc(totalSize);
    if (!RawMemory)
    {
        fprintf(stderr, "Error, unable to allocate memory.\n");
        exit(EXIT_FAILURE);
    }

    struct A *a = (struct A *) RawMemory;
    a->bOffset = aSize;
    a->cOffset = aSize   bSize;

    struct B *b = (struct B *) (RawMemory   a->bOffset);
    struct C *c = (struct C *) (RawMemory   a->cOffset);

    printf("a is at %p.\n", (void *) a);
    printf("b is at %p.\n", (void *) b);
    printf("c is at %p.\n", (void *) c);

    free(RawMemory);
}

Another question is if I dereference b or c, it is also an UB.

Memory allocated by malloc has no effective type. It may be used as any type by storing data there through an lvalue of that type.

The C standard’s rules about effective types in dynamically allocated memory involving structures are incomplete; the natural language wording is insufficient to write a formal semantic description. Certainly it is clear that if a struct S has members a, b, and c and no others, and we do:

struct S *p = malloc(sizeof *p);
p->a = 3;
p->b = 4;
p->c = 5;

then, for aliasing considerations, there should be a struct S at the memory address p even though the memory has only been written in parts, never with a full struct S lvalue. But the standard’s rules about effective type do not make this clear; they are are simply inadequate.

Putting multiple structures in the memory further complicates this. However, the purpose of the standard’s aliasing rules is to specify when objects can or cannot be aliased (and hence what optimizations the compiler can do regarding these). For practical purposes, as long as you use these structures in normal ways (the memory you designate for struct A as a struct A, the memory you designate for struct B as a struct B, and the memory you designate for struct C as a struct C), then you are not aliasing the memory with other types, and compilers are not going to perform unexpected optimizations that break that. I expect it is safe to use the allocated memory in this way.