Home > front end >  Why don't modern compilers coalesce neighboring memory accesses?
Why don't modern compilers coalesce neighboring memory accesses?

Time:11-26

Consider the following code:

bool AllZeroes(const char buf[4])
{
    return buf[0] == 0 &&
           buf[1] == 0 &&
           buf[2] == 0 &&
           buf[3] == 0;
}

Output assembly from Clang 13 with -O3:

AllZeroes(char const*):                        # @AllZeroes(char const*)
        cmp     byte ptr [rdi], 0
        je      .LBB0_2
        xor     eax, eax
        ret
.LBB0_2:
        cmp     byte ptr [rdi   1], 0
        je      .LBB0_4
        xor     eax, eax
        ret
.LBB0_4:
        cmp     byte ptr [rdi   2], 0
        je      .LBB0_6
        xor     eax, eax
        ret
.LBB0_6:
        cmp     byte ptr [rdi   3], 0
        sete    al
        ret

Each byte is compared individually, but it could've been optimized into a single 32-bit int comparison:

bool AllZeroes(const char buf[4])
{
    return *(int*)buf == 0;
}

Resulting in:

AllZeroes2(char const*):                      # @AllZeroes2(char const*)
        cmp     dword ptr [rdi], 0
        sete    al
        ret

I've also checked GCC and MSVC, and neither of them does this optimization. Is this disallowed by the C specification?

Edit: Changing the short-circuited AND (&&) to bitwise AND (&) will generate the optimized code. Also, changing the order the bytes are compared doesn't affect the code gen: https://godbolt.org/z/Y7TcG93sP

CodePudding user response:

If buf[0] is nonzero, the code will not access buf[1]. So the function should return false without checking the other buf elements. If buf is close to the end of the last memory page, buf[1] may trigger an access fault. The compiler should be very careful to not read stuff which may be forbidden to read.

CodePudding user response:

there's short-circuit evaluation thing. so it cant be optimized as you think. if arr[0] is false arr[1] must not be checked. it can be ub or something forbidden to use or whatever - this all must still work. https://en.wikipedia.org/wiki/Short-circuit_evaluation

  • Related