How C/C compiler distinguish regular two dimensional array and array of pointers to arrays?-CodePudding

Regular static allocated array looks like this, and may be accessed using the following formulas:

const int N = 3;
const int M = 3;

int a1[N][M] = { {0,1,2}, {3,4,5}, {6,7,8} };

int x = a1[1][2]; // x = 5 
int y = *(a1 2 N*1); // y = 5, this is what [] operator is doing in the background

Array is continuous region of memory. It looks different in case of dynamic array allocation, there is array of pointer to arrays instead:

int** a2 = new int*[N];
for (int i = 0; i < N; i  ) 
   a2[i] = new int[M];

//Assignment of values as in previous example

int x = a2[1][2];
int y = *(*(a2 1)) 2); // This is what [] operator is doing in the background, it needs to dereference pointers twice

As we can see, operations done by [] operator are completely different in case of typical continuous array and dynamically allocated array. My questions are now following:

Is my understanding of [] operations correct?
How C/C compiler can distinguish which [] operation it should perform, and where it's implemented? I can image implementing it myself in C by overloading [] operator, but how C/C treat this?
Will it work correctly in C language using malloc instead of new? I don't see any reasons why not actually.

CodePudding user response：

For this declaration of an array

int a1[N][M] = { {0,1,2}, {3,4,5}, {6,7,8} };

these records

int x = a1[1][2];
int y = *(a1 2 N*1);

are not equivalent.

The second one is incorrect. The expression *(a1 2 N*1) has the type int[3] that is implicitly converted to an object of the type int * used as an initializer. So the integer variable y is initialized by a pointer.

The operator a1[1] is evaluated like *( a1 1 ) . The result is a one-dimensional array of the type int[3].

So applying the second subscript operator you will get *( *( a1 1 ) 2 ).

The difference between the expressions when used the two-dimensional array and the dynamically allocated array is that the designator of the two-dimensional array in this expression (a1 1) is implicitly converted to a pointer to its first element of the type int ( * )[3] while the pointer to the dynamically allocated array of pointers still have the same type int **.

In the first case dereferencing the expression *(a1 1 ) you will get lvalue of the type int[3] that in turn used in the expression *( a1 1) 2 is again implicitly converted to a pointer of the type int *.

In the second case the expression *(a1 1) yields an object of the type int *.

In the both cases there is used the pointer arithmetic. The difference is that when you are using arrays in the subscript operator then they are implicitly converted to pointers to their first elements.

When you are allocating dynamically arrays when you are already deals with pointers to their first elements.

For example instead of these allocations

int** a2 = new int*[N];
for (int i = 0; i < N; i  ) 
   a2[i] = new int[M];

you could just write

int ( *a2 )[M] = new int[N][M];

CodePudding user response：

Is my understanding of [] operations correct?
int y = *(a1 2 N*1); // y = 5, this is what [] operator is doing in the background

By definition, the way to translate the subscript operators to the corresponding indirection and pointer arithmetic is:

int y = *(*(a1 1) 2)

Which is exactly the same as in the case of int**.

How C/C compiler can distinguish which [] operation it should perform

The compiler uses the type system. It knows the types of the expressions and it knows what subscript operation means for each type.

Will it work correctly in C language using malloc instead of new? I don't see any reasons why not actually.

It doesn't matter how an array is created. Subscript operator works the same way with all pointers.

CodePudding user response：

a1 and a2 are different types, and as such, the behavior of operator [] will depend on how that type defines the operator. In this case you're dealing with intrinsic compiler behaviors that conform to the C spec, but it could just as well be a std::unique_ptr<>, or MyClass with overloaded operator[]

CodePudding user response：

Each operation leads to result of some specific type. Each type defines what kind of operation is available for it.

Note that array has ability to decay to pointer to element of array. So some_array int_value leads to pointer to element.

Here is code which exposes types of each step: https://godbolt.org/z/jeKWh5WWW

#include <type_traits>

const int N = 3;
const int M = 4;

int a1[N][M] = { {0,1,2,0}, {3,4,5,0}, {6,7,8,0} };
int** a2 = new int*[N];

static_assert(
    std::is_same_v<decltype(a1[0][0]), int&>, 
    "value type is reference to int");

static_assert(
    std::is_same_v<decltype(a1[0]), int(&)[M]>, 
    "row type is reference to int aray");

static_assert(
    std::is_same_v<decltype(a1   1), int(*)[M]>, 
    "advanced pointer is pointer to array of ints");

static_assert(
    !std::is_same_v<decltype(a1[0]), int*&>, 
    "row type is reference to int pointer");



static_assert(
    std::is_same_v<decltype(a2[0][0]), int&>, 
    "value type is reference to int");

static_assert(
    !std::is_same_v<decltype(a2[0]), int(&)[M]>,
    "row type is not reference to int aray");

static_assert(
    std::is_same_v<decltype(a2   1), int**>, 
    "advanced pointer is pointer to pointer to int");

static_assert(
    std::is_same_v<decltype(a2[0]), int*&>, 
    "row type is reference to int pointer");

I think this is good appendix to other answers.

CodePudding user response：

How C/C compiler can distinguish which [] operation it should perform, and where it's implemented?

The built-in [] operator (that is, not a user-defined overload) always does one thing: It adds its two operands and dereference the results. E1[E2] is defined to be (*((E1) (E2))). Here is how this works:

If E1 or E2 is an array, it is automatically converted to a pointer to its first element. This not a part of the [] operator per se; it is a built-in part of the C and C languages. In C, the specific rule is that, whenever an array is used in an expression other than as the operand of sizeof, the operand of unary &, or as a string literal used to initialize an array, it is converted to a pointer to its first element.
Thus, whether the code is written with a pointer or an array, [] always has a pointer operand. You may write an array, but [] always receives a pointer.
The operator adds an integer to a pointer by adjusting the pointer by the given number of elements: Given a pointer to element j of an array and an integer k to add to it, it produces a pointer to element j k of the array.
From the pointer to an element, the * operator produces an lvalue for the referenced element.

The combination of automatic array conversion, , and *, means that A[i] produces an lvalue for element i of the array A.

Here is how this works for the expression A[i][j] where A is an array declared as SomeType A[m][n]:

In A[i][j], A is an array of m arrays of n elements. It is automatically converted to a pointer to its first element (the one with index 0).
Then A[i] produces an lvalue for element i of this array. In other words, the result of A[i] is an array; it is an array of n SomeType objects.
Since the result of A[i] is an array, it is automatically converted to a pointer to its first element.
Then A[i][j] produces an lvalue for element j of that array.

Since the pointer arithmetic operates in units of the pointed-to-type, it includes the scaling for the sizes of the elements. This is what makes the calculation of A[i] scaled by the size of the subarray of n elements.

Will it work correctly in C language using malloc instead of new? I don't see any reasons why not actually.

Sure, if done correctly.