Is it valid to typecast to a pointer to a variable length array type?
int main() {
int n = 10;
int m = 20;
int (*p)[n][m] = malloc(sizeof(int[n][m]));
void *q = p;
int a = n;
int b = m;
(*(int (*)[a][b])q)[5][5] = 1;
printf("%zd %d\n", sizeof(*p), (*p)[5][5]);
return 0;
}
prints 800 1
.
With
int a = 1;
still prints 800 1
.
With
int a = 1;
int b = 1;
now prints 800 0
.
Up to which point is well-defined behavior?
CodePudding user response:
It's undefined behavior since you access an array out of bounds.
It might be worth pointing out that the C type system is a bit muddy here though. Memory allocated with malloc
isn't given a type until accessed. I'll quote C17 6.5/6 and leave comments in between:
The effective type of an object for an access to its stored value is the declared type of the object, if any.
This "heap chunk" returned from malloc has no declared type.
If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
Meaning that if we store something at a given place in the heap chunk, that address will get an effective type and gets treated as an object (a single int
in this case). The compiler doesn't really know if the chunk as whole is to be regarded as an array, a struct or something else though.
For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
Meaning that if we read access this data before writing to it, it will get treated as a variable of the type used for reading. (The heap chunk isn't initialized by malloc
though, so reading before writing doesn't make much sense.)
Assuming 32 bit int
then malloc(sizeof(int[n][m]));
allocates a chunk of 10*20*4
= 800 bytes. This chunk set aside by malloc does not yet have an effective type. The compiler does not yet keep track of what type that is stored inside that data. The fact that you have a certain pointer type point at the data doesn't change that, as long as the data isn't de-referenced, it does not yet have an effective type.
Normally the heap chunk wouldn't get a type until you do the [5][5]
de-referencing and write 1
to a location in that chunk. That location can now be said to have effective type int
.
However, before you do that, you do *(int (*)[a][b])q
. The type of q
is irrelevant here - you cast it and say that here is a pointer to array. Then you de-reference that and get an array - this is a lvalue (and it will decay into a pointer to the first element of the array). From that point on, at least in this expression, the compiler can assume that it's dealing with an array - regardless if that array happens to be stored at memory with no effective type. If this array is a VLA and doesn't have dimension large enough for ... [5][5] = 1;
access, you end up with plain old array out of bounds access which is undefined behavior.
Specifically, UB as per 6.5.6/8 additive operators:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
arr[i]
being 100% equivalent to *(arr i)
, the
operator sets the rules for pointer arithmetic quoted above - you are not allowed to use pointer arithmetic to access an array or object out of bounds - it's undefined behavior. Meaning that the compiler can generate incorrectly behaving code, regardless of whether there's available memory at the supposed address or not.