Home > Back-end >  C Compilers -- Indirection with Multidim Arrays
C Compilers -- Indirection with Multidim Arrays

Time:10-05

By definition, in every standard of C, x[y] is equivalent to (and often compiled as) *((x) (y)). Additionally, a name of an array is converted to an address operator to it -- so if x is an array, it would be *((&(x)) (y))

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x)) (y)) (z))

In the small scale toy C compiler I'm working on, this fails to generate proper code, because it tries to indirectly access a pointed to address at every * instruction -- this works for single dimension arrays, but for multi dimension it results in something like (in vaguely assembly pseudocode)

load &x; add y; deref; add z; deref

Where deref is an instruction to load the value at the address of the previous calculation -- as this is how the indirection operator seems to work??

However, this will generate bad code, since we should be dealing all with a single address, only dereferencing at the very end. I'm assuming there's something in the spec I'm missing?

CodePudding user response:

name of an array is converted to an address operator to it

No. You could say that x is converted to &x[0], which has different type compared to &x.

Assuming you have T a[M][N];, doing a[x][y] does following:

  • a is converted to a temporary pointer of type T (*)[N], pointing to the first array element.

  • This pointer is incremented by x * sizeof(T[N]), i.e. by x * N * sizeof(T).

  • The pointer is dereferenced, giving you a value of type T[N].

  • The result is converted to a temporary pointer of type T *.

  • The pointer is incremented by y * sizeof(T).

  • Finally, the pointer is dereferenced to produce a value of type T.

Note that an array itself (multidimensional or not) doesn't store any pointers to itself. When converted to a pointer, the resulting pointer is calculated on the fly.

CodePudding user response:

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x)) (y)) (z))

No, a 2D array is an array of arrays. So *((x) (y)) gives you that array, x decays into a pointer to the first element, which is then de-referenced to give you array number y.

This array too "decays" into a pointer of the first element, so you get:

( (*((x) (y)))   (z) )

When part of an expression, arrays always decay into a pointer to it's first element. Except for a few exceptions, namely the & address of and sizeof operators. Why typing out the & as done in your pseudo code is just confusing.

A practical example would be:

int arr[x][y];
for(size_t i=0; i<x; i  )
  for(size_t j=0; j<y; j  )
    arr[i][j] = ...
  • In the expression arr[i][j], the [] is just "syntactic sugar" for pointer arithmetic (see Do pointers support "array style indexing"?).
  • So we get *((arr) (i)), where arr is decayed into a pointer to the type of the first element, int(*)[y].
  • Pointer arithmetic on that array pointer type yields array number i of type int [y].
  • Again, there is array decay on this one, because it too is an array part of an expression. We get a pointer to the first element, type int*.
  • Pointer arithmetic of the int* j gives the address of the integer, which is then finally de-referenced to give the actual int.

CodePudding user response:

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x)) (y)) (z))

You are mistaken. The expression x[y][z] is evaluated like

*( *( x   y )   z )

Here is a demonstrative program.

#include <stdio.h>

int main(void) 
{
    enum { M = 3, N = 3 };
    int a[M][N] =
    {
        { 1, 2, 3 },
        { 4, 5, 6 },
        { 7, 8, 9 }
    };
    
    for ( size_t i = 0; i < M; i   )
    {
        for ( size_t j = 0; j < N; j   )
        {
            printf( "%d ", *( *( a   i )   j ) );
        }
        putchar( '\n' );
    }

    return 0;
}

Its output is

1 2 3 
4 5 6 
7 8 9 

Array designators used in expressions (with rare exceptions) are implicitly converted to pointers to their first elements.

So if you have an array declared like

int a[M][N];

then teh array designator a is converted to a pointer to its first element ("row"). The type of the array element is int[N]. So a pointer to such object has the type int ( * )[N].

If you want that a pointer point to the i-th element of the array you need to write the expression a i. Dereferencing the expression you will get the i-th row (one-dimensional array) that in turn used in expressions is converted to a pointer to its first element.

So the expression a i has the type int ( * )[N].

The expression *( a i ) has the type int[N] that at once is implicitly converted to a pointer of the type int * to its firs element in the enclosing expression.

The expression *( a i ) j points to the j-th element of the "row" of the two-dimensional array. Dereferencing the expression *( *( a i ) j ) you will get the j-th element of the i-th row of the array.

  • Related