Why is gcc emitting worse code with __builtin

With f0 and f1 as below,

long long b;

void f0(int a) {
    a %= 10;
    if (a == 0) b  = 11;
    else if (a == 1) b  = 13;
    else if (a == 2) b  = 17;
    else if (a == 3) b  = 19;
    else if (a == 4) b  = 23;
    else if (a == 5) b  = 29;
    else if (a == 6) b  = 31;
    else if (a == 7) b  = 37;
    else if (a == 8) b  = 41;
    else if (a == 9) b  = 43;
}

void f1(int a) {
    a %= 10;
    if (a == 0) b  = 11;
    else if (a == 1) b  = 13;
    else if (a == 2) b  = 17;
    else if (a == 3) b  = 19;
    else if (a == 4) b  = 23;
    else if (a == 5) b  = 29;
    else if (a == 6) b  = 31;
    else if (a == 7) b  = 37;
    else if (a == 8) b  = 41;
    else if (a == 9) b  = 43;
    else __builtin_unreachable();
}

assuming the argument a is always positive in the program, the compiler should produce more optimized code for f1 because in f0, a can fall through the if-else block when it is negative, so the compiler should produce a default "do nothing and return" code. However in f1, the possible range of a is clearly stated with __builtin_unreachable so that the compiler doesn't have to think when a is out of range.

However, f1 actually runs slower, so I had a look at the disassembly. This is the control flow part of f0.

    jne .L2
    addq    $11, b(%rip)
    ret
    .p2align 4,,10
    .p2align 3
.L2:
    cmpl    $9,