In C, why can we access an array element whose index is out of range?-CodePudding

In C, why can we access an array element whose index is out of range?

int arr[5];
printf("%d", arr[200]);

This gives a garbage value. Why is this?

CodePudding user response：

C does not enforce boundary checks on arrays. It was probably done for performance reasons at the time. Your program exhibits undefined behavior (UB) when it tries to access your array out of bounds. It would also be UB if the access was within bounds as the array is not initialized.

Some boundary checks can be done at compile time. For instance if you build your program with gcc -O2 -Warray-bounds 1.c:

#include <stdio.h>

int main(void) {
        int arr[5];
        printf("%d", arr[200]);
}

The compiler will warn you about this issue:

1.c: In function ‘main’:
1.c:5:2: warning: array subscript 200 is above array bounds of ‘int[5]’ [-Warray-bounds]
    5 |  printf("%d", arr[200]);
      |  ^~~~~~~~~~~~~~~~~~~~~~
1.c:4:6: note: while referencing ‘arr’
    4 |  int arr[5];
      |      ^~~

Other boundary checks would need to happen at run-time. If you build this program with gcc -fsanitize=bounds 2.c:

#include <stdio.h>

int main(void) {
        int i;
        scanf("%d", &i);
        int arr[5] = {0, 1, 2, 3, 4};
        printf("%d", arr[i]);
}

The program will now give you a run-time error for input 200 as it's out of bounds:

2.c:7:18: runtime error: index 200 out of bounds for type 'int [5]'

CodePudding user response：

C is now a rather old language. It was created in the 70's as the primary language to build the first Unix operating systems (kernel, libraries and commands). The main goal was to have something that was easy to efficiently translate in machine code, yet more or less portable.

For that reason, performance was much more important then robustness or portability. The rule was that the programmer had to know what it was writing, and the compiler should just blindly obey. And the notion of array was as simple as possible: just a start adress, the size being only used at declaration time to reserve memory... Not only the language did not control index validity, but it did not even carry the way to do such controls. This point is still in the philosophy of the language and is the reason why even in recent versions, an array decays to a pointer when it is used where a pointer could be.

Of course compilers are now much more user friendly and good ones warn the programmer for the common errors that can be detected at compiler time. But because of the philosophy of the language (and to avoid breaking legacy code...) those are only warnings and the compiler will do its best to translate it as closely as possible in machine code. Simply good programmers know that the standard says that code like that would invoke Undefined Behaviour, and do not write it...

CodePudding user response：

In C, reading an array element means reading the bytes at the offset corresponding to the index:

In your example, arr is defined as an array of 5 int: to compile printf("%d", arr[200]), the compiler generates code that computes the address of an int at index 200 of array arr. The code multiplies the index value by the number of bytes in an int (sizeof(int), probably 4 on your target), adds this offset (800) to the address of arr, and generates the code to read 4 bytes at this address and pass them to printf.

The computed address may be invalid and generate a runtime error such as a segmentation fault, but the C Standard just specifies this as undefined behavior, ie: anything can happen.