I have the following three struct
s.
struct A {
int a;
int bOffset;
int cOffset;
};
struct B {
long long b;
int other[];
};
struct C {
long long c;
int other[];
};
The main
function is like:
int main(void) {
int otherSize = 0;
scanf("%d", &otherSize);
int aSize = sizeof(struct A);
int bSize = sizeof(struct B) sizeof(int) * otherSize;
int cSize = sizeof(struct C) sizeof(int) * otherSize;
int totalSize = aSize bSize cSize;
struct A *a = malloc(totalSize);
a->bOffset = aSize;
a->cOffset = aSize bSize;
struct B *b = (struct B*)((char*)a a->bOffset);
struct C *c = (struct C*)((char*)a a->cOffset);
......
}
The space of struct A
, struct B
and struct C
is allocated together to show better cache behavior. My question is, according to previous posts on SO, the cast
struct B *b = (struct B*)((char*)a a->bOffset);
struct C *c = (struct C*)((char*)a a->cOffset);
is an undefined behavior in C because struct B
and struct C
have stricter alignment requirement than struct A
. Then what can I do so that the cast is well-defined in C?
What I can come up with now is to add a long long
variable to struct A
as follow.
struct A {
int a;
int bOffset;
int cOffset;
long long unused;
};
Another question is if I dereference b
or c
, it is also an UB. Is there any way to address this issue?
CodePudding user response:
Then what can I do so that the cast is well-defined in C?
To calculate correctly where struct B
and struct C
should be placed, you should pad the previous sizes to the necessary alignment. C provides the _Alignof
operator to provide the alignment requirement of a type. So this code will do the job:
/* Calculate how many bytes are required to add to size s to make it be a
multiple of alignment a. If s is a multiple of a, this is zero.
Otherwise, we need to add a-r bytes, where r is the remainder of s divided
by a.
Omitting the parentheses used for macro parameters, the following code is
a - (s-1)%a - 1. To see it works, consider two cases:
s is a multiple of a. Then s-1 is a-1 modulo a, and the expression
evaluates to a - (a-1) - 1 = 0.
s has some non-zero remainder r modulo a. Then (s-1)%a evaluates to
r-1, and the expression evaluates to a - (r-1) - 1 = a-r.
*/
#define PadToAlignment(s, a) ((a) - ((s)-1) % (a) - 1)
…
// Add padding needed to align struct B and struct C correctly.
aSize = PadToAlignment(aSize, _Alignof (struct B));
bSize = PadToAlignment(aSize bSize, _Alignof (struct C));
Notes
You should generally use size_t
rather than int
for sizes. Also, when using types with sizeof
and _Alignof
, I prefer not to write them like function calls, as in sizeof(int)
, because they are not function calls. Rather, they are operators with an operand that is a typename enclosed in parentheses for grammatical reasons, so sizeof (int)
helps remind readers of the meaning of the C code.
Here is complete program incorporating these:
#include <stdio.h>
#include <stdlib.h>
/* Calculate how many bytes are required to add to size s to make it be a
multiple of alignment a. If s is a multiple of a, this is zero.
Otherwise, we need to add a-r bytes, where r is the remainder of s divided
by a.
Omitting the parentheses used for macro parameters, the following code is
a - (s-1)%a - 1. To see it works, consider two cases:
s is a multiple of a. Then s-1 is a-1 modulo a, and the expression
evaluates to a - (a-1) - 1 = 0.
s has some non-zero remainder r modulo a. Then (s-1)%a evaluates to
r-1, and the expression evaluates to a - (r-1) - 1 = a-r.
*/
#define PadToAlignment(s, a) ((a) - ((s)-1) % (a) - 1)
struct A {
int a;
int bOffset;
int cOffset;
};
struct B {
long long b;
int other[];
};
struct C {
long long c;
int other[];
};
int main(void)
{
int otherSize = 0;
if (1 != scanf("%d", &otherSize))
{
fprintf(stderr, "Error, scanf failed.\n");
exit(EXIT_FAILURE);
}
size_t aSize = sizeof (struct A);
size_t bSize = sizeof (struct B) sizeof (int) * otherSize;
size_t cSize = sizeof (struct C) sizeof (int) * otherSize;
// Add padding needed to align struct B and struct C correctly.
aSize = PadToAlignment(aSize, _Alignof (struct B));
bSize = PadToAlignment(aSize bSize, _Alignof (struct C));
size_t totalSize = aSize bSize cSize;
unsigned char *RawMemory = malloc(totalSize);
if (!RawMemory)
{
fprintf(stderr, "Error, unable to allocate memory.\n");
exit(EXIT_FAILURE);
}
struct A *a = (struct A *) RawMemory;
a->bOffset = aSize;
a->cOffset = aSize bSize;
struct B *b = (struct B *) (RawMemory a->bOffset);
struct C *c = (struct C *) (RawMemory a->cOffset);
printf("a is at %p.\n", (void *) a);
printf("b is at %p.\n", (void *) b);
printf("c is at %p.\n", (void *) c);
free(RawMemory);
}
Another question is if I dereference b or c, it is also an UB.
Memory allocated by malloc
has no effective type. It may be used as any type by storing data there through an lvalue of that type.
The C standard’s rules about effective types in dynamically allocated memory involving structures are incomplete; the natural language wording is insufficient to write a formal semantic description. Certainly it is clear that if a struct S
has members a
, b
, and c
and no others, and we do:
struct S *p = malloc(sizeof *p);
p->a = 3;
p->b = 4;
p->c = 5;
then, for aliasing considerations, there should be a struct S
at the memory address p
even though the memory has only been written in parts, never with a full struct S
lvalue. But the standard’s rules about effective type do not make this clear; they are are simply inadequate.
Putting multiple structures in the memory further complicates this. However, the purpose of the standard’s aliasing rules is to specify when objects can or cannot be aliased (and hence what optimizations the compiler can do regarding these). For practical purposes, as long as you use these structures in normal ways (the memory you designate for struct A
as a struct A
, the memory you designate for struct B
as a struct B
, and the memory you designate for struct C
as a struct C
), then you are not aliasing the memory with other types, and compilers are not going to perform unexpected optimizations that break that. I expect it is safe to use the allocated memory in this way.