In C, is it "legal" to under-allocate memory to a pointer-to-array if we then only access elements that fall within the allocated memory? Or does this invoke undefined behavior?
int (*foo)[ 10 ]; //Pointer to array of 10 ints
foo = malloc( sizeof( int ) * 5 ); //Under-allocation!
//Only enough memory for 5 ints
//Now we only ever access (*foo)[ 0 - 4 ]
If this, in and of itself, is not undefined behavior, then could accessing another, unrelated object whose memory address happens to fall within the address space of unallocated part of the array cause a strict-aliasing violation?
CodePudding user response:
This is undefined behavior.
foo
is supposed to point to an object (or the first of an array of objects) of type int[10]
. This is considered an object of array type, defined in section 6.2.5p20 of the C standard
An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T , the array type is sometimes called ‘‘array of T ’’. The construction of an array type from an element type is called ‘‘array type derivation’’
The part I've highlighted in bold is the important part. An int[10]
is therefore a contiguously allocated set of 10 objects of type int
.
You don't allocate enough space, so the expression *foo
which has type int[10]
accesses an object of that type, but doing so reads past the end of an allocated memory segment.
CodePudding user response:
As @dbush describes in his answer, an array is defined to be a contiguously allocated non-empty set of objects of the element type (C17 6.2.5/20). Clearly, then, malloc( sizeof( int ) * 5 )
does not allocate enough space for an int[10]
.
But I found it difficult to formally support the last part of that answer, claiming that the size differential makes (for example) (*foo)[4]
have undefined behavior. That conclusion seems plausible, but where does the standard actually say so?
One of the main problems here is that (dynamically) allocated objects have no declared type, only, under some circumstances, an effective type determined by how they are and have been accessed. (C17 6.5/6 and footnote 88). We do know that on success, malloc(n)
returns a pointer to an object of size n
(C17 7.22.3.4/2), but how do we attribute undefined behavior specifically to the association with that object of an effective type describing objects of size larger than n
?
I ultimately decided that the best way to connect the dots is as follows. Suppose that o
is an allocated object of size n
, T
is a complete type having sizeof(T) > n
, and o
is read or written via an lvalue of type T
. Then paragraph 6.5/6 attributes effective type T
to object o
, but because o
's size is insuficient we must conclude that its representation constitutes a trap representation of type T
(C17 3.19.4). Paragraph 6.2.6.1/5 then reiterates the definition of "trap representation" and gets us to where we want to go:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.
(Emphasis added.)