Is copying 2D arrays with "memcpy" technically undefined behaviour?-CodePudding

An interesting discussion has arisen in the comments to this recent question: Now, although the language there is C, the discussion has drifted to what the C Standard specifies, in terms of what constitutes undefined behaviour when accessing the elements of a multidimensional array using a function like std::memcpy.

First, here's the code from that question, converted to C and using const wherever possible:

#include <iostream>
#include <cstring>

void print(const int arr[][3], int n)
{
    for (int r = 0; r < 3;   r) {
        for (int c = 0; c < n;   c) {
            std::cout << arr[r][c] << " ";
        }
        std::cout << std::endl;
    }
}

int main()
{
    const int arr[3][3] = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} };
    int arr_copy[3][3];
    print(arr, 3);
    std::memcpy(arr_copy, arr, sizeof arr);
    print(arr_copy, 3);
    return 0;
}

The issue is in the call to std::memcpy: the arr argument will yield (by decay) a pointer to the first int[3] subarray so, according to one side of the discussion (lead by Ted Lyngmo), when the memcpy function accesses data beyond the third element of that subarray, there is formally undefined behaviour (and the same would apply to the destination, arr_copy).

However, the other side of the debate (to which mediocrevegetable1 and I subscribe) uses the rationale that each of the 2D arrays will, by definition, occupy continuous memory and, as the arguments to memcpy are just void* pointers to those locations (and the third, size argument is valid), then there cannot be UB here.

Here's a summary of some of the comments most pertinent to the debate, in case any "clean-up" occurs on the original question (bolding for emphasis mine):

I don't think there's any out-of-bounds here. Just like memcpy works for an array of ints, it works for an array of int [3]s, and both should be contiguous (but I'm not 100% sure). – mediocrevegetable1

The out of bounds access happens when you copy the first byte from arr[0][3]. I've never seen it actually fail, but, in C , it has UB. – Ted Lyngmo

But the memcpy function/call doesn't do any array indexing - it's just given two void* pointers and copies memory from one to the other. – Adrian Mole

I can't say for sure if that matters in C. In C it doesn't. You get a pointer to the first int[3] and any access out of its range has UB. I haven't found any exception to that in the C standard. – Ted Lyngmo

I don't think the arr[0][3] thing applies. By that logic, I think copying the second int of an int array through memcpy would be UB as well. int [3] is simply the type of arr's elements, and the bounds of arr as a whole in bytes should be sizeof (int [3]) * 3. I'm probably missing something though :/ – mediocrevegetable1

Are there any C Language-Lawyers who can settle the matter – preferably with (an) appropriate citation(s) from the C Standard?

Also, relevant citations from the C Standard may be helpful – especially if the two language Standards differ – so I've included the C tag in this question.

CodePudding user response：

std::memcpy(arr_copy, arr, sizeof arr); (your example) is well-defined.

std::memcpy(arr_copy, arr[0], sizeof arr);, on the other hand, would cause undefined behavior.

Multidimensional arrays are 1D arrays of arrays. As far as I know, they don't get much (if any) special treatment compared to true 1D arrays (i.e. arrays with elements of non-array type).

Consider an example with a 1D array:

int a[3] = {1,2,3}, b[3];
std::memcpy(b, a, sizeof(int) * 3);

This is obviously well-defined, so I'm not going to cite the standard.

Notice that memcpy receives a pointer to the first element of the array, and can access other elements.

The element type doesn't affect the validity of this example. If you use a 2D array, the element type becomes int[N] rather than int, but the validity is not affected.

Now, consider a different example:

int a[2][2] = {{1,2},{3,4}}, b[4];
std::memcpy(b, a[0], sizeof(int) * 4);
//             ^~~~

This one causes UB, because since memcpy is given a pointer to the first element of a[0], it can only access the elements of a[0] (a[0][i]), and not a[j][i].

But, if you want my opinion, this is a "tame" kind of UB, likely to not cause problems in practice (but, as always, UB should be avoided if possible).

CodePudding user response：

What is passed to the function decays into a pointers to the first elements, that is in this case, two int(*)[3]s.

C draft Annex J (informative) Portability issues J.2 Undefined behavior:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

memcpy(arr_copy, arr, sizeof arr); get's two int(*)[3] and will access both out of range, hence, UB.