C Compilers -- Indirection with Multidim Arrays-CodePudding

By definition, in every standard of C, x[y] is equivalent to (and often compiled as) *((x) (y)). Additionally, a name of an array is converted to an address operator to it -- so if x is an array, it would be *((&(x)) (y))

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x)) (y)) (z))

In the small scale toy C compiler I'm working on, this fails to generate proper code, because it tries to indirectly access a pointed to address at every * instruction -- this works for single dimension arrays, but for multi dimension it results in something like (in vaguely assembly pseudocode)

load &x; add y; deref; add z; deref

Where deref is an instruction to load the value at the address of the previous calculation -- as this is how the indirection operator seems to work??

However, this will generate bad code, since we should be dealing all with a single address, only dereferencing at the very end. I'm assuming there's something in the spec I'm missing?

CodePudding user response：

name of an array is converted to an address operator to it

No. You could say that x is converted to &x[0], which has different type compared to &x.

Assuming you have T a[M][N];, doing a[x][y] does following:

a is converted to a temporary pointer of type T (*)[N], pointing to the first array element.
This pointer is incremented by x * sizeof(T[N]), i.e. by x * N * sizeof(T).
The pointer is dereferenced, giving you a value of type T[N].
The result is converted to a temporary pointer of type T *.
The pointer is incremented by y * sizeof(T).
Finally, the pointer is dereferenced to produce a value of type T.

Note that an array itself (multidimensional or not) doesn't store any pointers to itself. When converted to a pointer, the resulting pointer is calculated on the fly.

CodePudding user response：

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x)) (y)) (z))

No, a 2D array is an array of arrays. So *((x) (y)) gives you that array, x decays into a pointer to the first element, which is then de-referenced to give you array number y.

This array too "decays" into a pointer of the first element, so you get:

( (*((x) (y)))   (z) )

When part of an expression, arrays always decay into a pointer to it's first element. Except for a few exceptions, namely the & address of and sizeof operators. Why typing out the & as done in your pseudo code is just confusing.

A practical example would be:

int arr[x][y];
for(size_t i=0; i<x; i  )
  for(size_t j=0; j<y; j  )
    arr[i][j] = ...

In the expression arr[i][j], the [] is just "syntactic sugar" for pointer arithmetic (see Do pointers support "array style indexing"?).
So we get *((arr) (i)), where arr is decayed into a pointer to the type of the first element, int(*)[y].
Pointer arithmetic on that array pointer type yields array number i of type int [y].
Again, there is array decay on this one, because it too is an array part of an expression. We get a pointer to the first element, type int*.
Pointer arithmetic of the int* j gives the address of the integer, which is then finally de-referenced to give the actual int.

CodePudding user response：

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x)) (y)) (z))

You are mistaken. The expression x[y][z] is evaluated like

*( *( x   y )   z )

Here is a demonstrative program.

#include <stdio.h>

int main(void) 
{
    enum { M = 3, N = 3 };
    int a[M][N] =
    {
        { 1, 2, 3 },
        { 4, 5, 6 },
        { 7, 8, 9 }
    };
    
    for ( size_t i = 0; i < M; i   )
    {
        for ( size_t j = 0; j < N; j   )
        {
            printf( "%d ", *( *( a   i )   j ) );
        }
        putchar( '\n' );
    }

    return 0;
}

Its output is

1 2 3 
4 5 6 
7 8 9

Array designators used in expressions (with rare exceptions) are implicitly converted to pointers to their first elements.

So if you have an array declared like

int a[M][N];

then teh array designator a is converted to a pointer to its first element ("row"). The type of the array element is int[N]. So a pointer to such object has the type int ( * )[N].

If you want that a pointer point to the i-th element of the array you need to write the expression a i. Dereferencing the expression you will get the i-th row (one-dimensional array) that in turn used in expressions is converted to a pointer to its first element.

So the expression a i has the type int ( * )[N].

The expression *( a i ) has the type int[N] that at once is implicitly converted to a pointer of the type int * to its firs element in the enclosing expression.

The expression *( a i ) j points to the j-th element of the "row" of the two-dimensional array. Dereferencing the expression *( *( a i ) j ) you will get the j-th element of the i-th row of the array.