Why is the Index NOT out of bounds although it intuitively should?-CodePudding

I'm relatively new to C programming and I stumbled upon a for me unexplainable behaviour while running the following code and debugging it using gdb and lldb.

In short: When swapping the indices i and j (max i != max j) when accessing a value in a two-dimensional Array inside a double nested for-loop it does not seem to matter if I access the value using array[i][j] or array[j][i].

The two loops and arrays are mostly identical.

 unsigned matrix[3][1] =
   {
       {3},
       {4},
       {5}
   };

   //Loop1
   for (size_t i = 0; i < sizeof(matrix) / sizeof(*matrix); i  )
   {
        for (size_t j = 0; j < sizeof(matrix[i]) / sizeof(*matrix[i]); j  )
        {
            matrix[i][j] <<= 1;
            printf("matrix[%zu][%zu]  has the value: %d\n", i, j, matrix[i][j]);
        }
   }

   //same two dimensional array as matrix
   unsigned matrix2[3][1] =
   {
       {3},
       {4},
       {5}
   };

   //Loop2, basically the same loop as Loop1
   for (size_t i = 0; i < sizeof(matrix2) / sizeof(*matrix2); i  )
   {
        for (size_t j = 0; j < sizeof(matrix2[i]) / sizeof(*matrix2[i]); j  )
        {
            //swapped i and j here
            matrix2[j][i] <<= 1;
            printf("matrix2[%zu][%zu]  has the value: %d\n", j, i, matrix2[j][i]);
        }
   }

Am I missing here something?

In both cases i is passed the value 2 at the end of the outer loop and j the value 0 at the end of the inner loop.

Intuitively, matrix[0][2] should throw an exception as each row only has one element.

CodePudding user response：

I will take a slightly different approach than the other respondents.

You are technically not reading outside of the array's boundary as far as the memory layout is concerned. Looking at it from a human perspective you are (the index [0][2] doesn't exist!), but the memory layout of the array is contiguous. Each of the "rows" of the matrix are stored next to each other.

In memory, your array is stored as: | ? | 3 | 4 | 5 | ? | So when you index to matrix[1][0] or matrix [0][1] you are accessing the same position in memory. This would not be the case if your array was larger than 1 dimension wide.

For example, replace your array with the following one and experiment. You can access integer '4' either by indexing matrix[0][2], or matrix [1][0]. The position [0][2] shouldn't exist, but it does because the memory is contiguous.

unsigned matrix[3][2] =
   {
       {3, 6},
       {4, 8},
       {5, 10}
   };

CodePudding user response：

Oops, matrix[0][2] should throw an exception as each row only has one element...

Some languages do warn the programmer by an exception if they try an out of bound access, but C does not. It just invokes Undefined Behaviour. On a technical point of view, it means that the compiler does not have to test the out of bound condition. On an operational point of view, it means that anything can happen, including expected behaviour... or an immediate crash... or a modification of an unrelated variable... or...

CodePudding user response：

If my C skills aren't mega-rusty you're reading "unsafe memory".

Essentially your matrix is declared as a block of bytes. After that block of bytes there are more bytes. What are they? Usually more variables that are declared as your program's data. Once you reach the end of the program's data block you reach the user code memory block (encoded ASM instructions).

Most languages perform checks and throw an exception when you run out of bounds by somehow keeping track of the last index that is valid to access. C does not do that and doing such thing is your very own responsibility. If you aren't careful you might be overwriting important parts of your program's code.

There are attacks that one can perform on C programs that don't sanitize user input, like a buffer overrun; which exploits what it's been described.

Essentially if you declare a char[] of length N and store a string that comes from outside and this string happens to be of length N X you'll be overwriting program memory (instructions).

With the right sequence of characters you can inject your very own assembly code into a running program which doesn't sanitize user input