Question about values out of bounds of an array in C-CodePudding

I have a question about this code below:

#include <stdio.h>

char abcd(char array[]);

int main(void)
{
    char array[4] = { 'a', 'b', 'c', 'd' };

    printf("%c\n", abcd(array));

    return 0;
}

char abcd(char array[])
{
    char *p = array;

    while (*p) {
        putchar(*p);
        p  ;
    }
    putchar(*p);
    putchar(p[4]);
    
    return *p;
}

Why isn't segmentation fault generated when this program comes across putchar(*p) right after exiting while loop? I think that after *p went beyond the array[3] there is supposed to be no value assigned to other memory locations. For example, trying to access p[4] would be illegal because it would be out of the bound, I thought. On the contrary, this program runs with no errors. Is this because any other memories which no value are assigned (in this case any other memories than array[4]) should be null, whose value is '\0'?

CodePudding user response：

OP seems to think accessing an array out-of-bounds, something special should happen.

Accessing outside array bounds is undefined behavior (UB). Anything may happen.

CodePudding user response：

Let's clarify what a undefined behavior is.

The C standard is a contract between the developer and the compiler as to what the code means. However, it just so happens that you can write things that are just outside what is defined by the standard.

One of the most common cases is trying to do out-of-bounds access. Other languages say that this should result in an exception or another error. C does not. An argument is that it would imply adding costly checks at every array access.

The compiler does not know that what you are writing is undefined behavior¹. Instead, the compiler assumes that what you write contains no undefined behavior, and translate your code to assembly accordingly.

If you want an example, compile the code below with or without optimizations:

#include <stdio.h>

int table[4] = {0, 0, 0, 0};

int exists_in_table(int v)
{
    for (int i = 0; i <= 4; i  ) {
        if (table[i] == v) {
            return 1;
        }
    }
    return 0;
}

int main(void) {
    printf("%d\n", exists_in_table(3));
}

Without optimizations, the assembly I get from gcc does what you might expect: it just goes too far in the memory, which might cause a segfault if the array is allocated right before a page boundary.

With optimizations, however, the compiler looks at your code and notices that it cannot exit the loop (otherwise, it would try to access table[4], which cannot be), so the function exists_in_table necessarily returns 1. And we get the following, valid, implementation:

exists_in_table(int):
        mov     eax, 1
        ret

Undefined behavior means undefined. They are very tricky to detect since they can be virtually invisible after compiling. You need advanced static analyzer to interpret the C source code and understand whether what it does can be undefined behavior.

¹ in the general case, that is; modern compilers use some basic static analyzer to detect the most common errors

CodePudding user response：

C does no bounds checking on array accesses; because of how arrays and array subscripting are implemented, it can't do any bounds checking. It simply doesn't know that you've run past the end of the array. The operating environment will throw a runtime error if you cross a page boundary, but up until that point you can read or clobber any memory following the end of the array.

The behavior on subscripting past the end of the array is undefined - the language definition does not require the compiler or the operating environment to handle it any particular way. You may get a segfault, you may get corrupted data, you may clobber a frame pointer or return instruction address and put your code in a bad state, or it may work exactly as expected.