An interesting discussion has arisen in the comments to this recent question: Now, although the language there is C, the discussion has drifted to what the C Standard specifies, in terms of what constitutes undefined behaviour when accessing the elements of a multidimensional array using a function like std::memcpy
.
First, here's the code from that question, converted to C and using const
wherever possible:
#include <iostream>
#include <cstring>
void print(const int arr[][3], int n)
{
for (int r = 0; r < 3; r) {
for (int c = 0; c < n; c) {
std::cout << arr[r][c] << " ";
}
std::cout << std::endl;
}
}
int main()
{
const int arr[3][3] = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} };
int arr_copy[3][3];
print(arr, 3);
std::memcpy(arr_copy, arr, sizeof arr);
print(arr_copy, 3);
return 0;
}
The issue is in the call to std::memcpy
: the arr
argument will yield (by decay) a pointer to the first int[3]
subarray so, according to one side of the discussion (lead by Ted Lyngmo), when the memcpy
function accesses data beyond the third element of that subarray, there is formally undefined behaviour (and the same would apply to the destination, arr_copy
).
However, the other side of the debate (to which mediocrevegetable1 and I subscribe) uses the rationale that each of the 2D arrays will, by definition, occupy continuous memory and, as the arguments to memcpy
are just void*
pointers to those locations (and the third, size
argument is valid), then there cannot be UB here.
Here's a summary of some of the comments most pertinent to the debate, in case any "clean-up" occurs on the original question (bolding for emphasis mine):
I don't think there's any out-of-bounds here. Just like
memcpy
works for an array ofint
s, it works for an array ofint [3]
s, and both should be contiguous (but I'm not 100% sure). – mediocrevegetable1
The out of bounds access happens when you copy the first byte from
arr[0][3]
. I've never seen it actually fail, but, in C , it has UB. – Ted Lyngmo
But the
memcpy
function/call doesn't do any array indexing - it's just given twovoid*
pointers and copies memory from one to the other. – Adrian Mole
I can't say for sure if that matters in C. In C it doesn't. You get a pointer to the first
int[3]
and any access out of its range has UB. I haven't found any exception to that in the C standard. – Ted Lyngmo
I don't think the
arr[0][3]
thing applies. By that logic, I think copying the secondint
of anint
array throughmemcpy
would be UB as well.int [3]
is simply the type ofarr
's elements, and the bounds ofarr
as a whole in bytes should besizeof (int [3]) * 3
. I'm probably missing something though :/ – mediocrevegetable1
Are there any C Language-Lawyers who can settle the matter – preferably with (an) appropriate citation(s) from the C Standard?
Also, relevant citations from the C Standard may be helpful – especially if the two language Standards differ – so I've included the C tag in this question.
CodePudding user response:
std::memcpy(arr_copy, arr, sizeof arr);
(your example) is well-defined.
std::memcpy(arr_copy, arr[0], sizeof arr);
, on the other hand, would cause undefined behavior.
Multidimensional arrays are 1D arrays of arrays. As far as I know, they don't get much (if any) special treatment compared to true 1D arrays (i.e. arrays with elements of non-array type).
Consider an example with a 1D array:
int a[3] = {1,2,3}, b[3];
std::memcpy(b, a, sizeof(int) * 3);
This is obviously well-defined, so I'm not going to cite the standard.
Notice that memcpy
receives a pointer to the first element of the array, and can access other elements.
The element type doesn't affect the validity of this example. If you use a 2D array, the element type becomes int[N]
rather than int
, but the validity is not affected.
Now, consider a different example:
int a[2][2] = {{1,2},{3,4}}, b[4];
std::memcpy(b, a[0], sizeof(int) * 4);
// ^~~~
This one causes UB, because since memcpy
is given a pointer to the first element of a[0]
, it can only access the elements of a[0]
(a[0][i]
), and not a[j][i]
.
But, if you want my opinion, this is a "tame" kind of UB, likely to not cause problems in practice (but, as always, UB should be avoided if possible).
CodePudding user response:
What is passed to the function decays into a pointers to the first elements, that is in this case, two int(*)[3]
s.
C draft Annex J (informative) Portability issues J.2 Undefined behavior:
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7]
given the declarationint a[4][5]
) (6.5.6).
memcpy(arr_copy, arr, sizeof arr);
get's two int(*)[3]
and will access both out of range, hence, UB.