Why does using the "register" keyword makes my code faster?-CodePudding

I am learning/re-learning C, and I've learnt about the register keyword. On many websites, people said it is recommended to not be used, or even useless. The book I am using says it is useful in for-loops, then I tried without the keyword:

#include <stdio.h>
#include <time.h>

int main() {
    int count = 1;

    clock_t start = clock();
    while(count != 0) {
        count  ;
    }
    clock_t stop = clock();
    double time = (double)(stop - start);

    printf("%f", time);

    return 0;
}

and, then with it:

[..] register int count = 1; [...]

Compiling the source code with GCC, I have seen that using register made the program around 5x faster.

Hence, my question, is there something wrong with me (e.g. my GCC) or is the register keyword actually useful?

CodePudding user response：

Even if signed overflow is undefined behavior, binaries do have a definitive behavior. So we can check why using register makes the code faster.

Using an undefined behavior with no optimizations means that GCC can produce a completely different assembly even for small, apparently insignificant, modifications.
register is such insignificant modification since it's an historical micro-optimization that GCC doesn't honor anymore (apparently).
Quoting the GNU C Manual Reference:

20.10 auto and register
For historical reasons, you can write auto or register before a local variable declaration. auto merely emphasizes that the variable isn’t static; it changes nothing.
register suggests to the compiler storing this variable in a register. How- ever, GNU C ignores this suggestion, since it can choose the best variables to store in registers without any hints.

However, the two binaries differ exactly in the fact that count is held in a register (ebx specifically) vs a local variable (and in the fact that a frame pointer is created).

So register does indeed makes your code faster. You can see, on the left, that without it (at -O0 optimization level) GCC generated add [rbp count], 1 (this is a 32-bit increment, IDA doesn't show that) while with the register modifier add ebx, 1 was generated.
The 5x slow-down seems to match the store-load forwarding latency.

Note however that GCC may move count to a register or in memory at its will (unless maybe with volatile), this can happen if you change int to unsigned int or to unsigned long long or if too many other local variables are in use or with any other compiler switch (like specific optimizations).

register had the desired effect in this simple code because there were no other constraints in place from the compiler analysis.
It's however interesting to see that GCC doesn't completely ignore it like Stallman claims in his manual.

CodePudding user response：

Your program has undefined behavior because you rely on count becoming 0 by incrementation, which may occur as a side effect of signed arithmetic overflow, but is not defined.

You could modify your test by defining count as an unsigned int as unsigned int arithmetics is defined as computed modulo UINT_MAX 1.

#include <stdio.h>
#include <time.h>

int main() {
    unsigned int count = 1;

    clock_t start = clock();
    while(count != 0) {
        count  ;
    }
    clock_t stop = clock();
    double time = (double)(stop - start);

    printf("%f\n", time);

    return 0;
}

Yet the above code executes instantly when compiled with optimisations because the compiler can determine that no side effect occurs during the 4294967295 iterations leading to count == 0. Adding a register keyword does not change that.

Compiling with optimisations disabled gives a different result for gcc as can be observed on Godbolt's Compiler Explorer. Without the register keyword, the variable count is stored in memory within the stack frame, whereas it is indeed stored in register