Home > Software design >  Why the first actual parameter printing as a output in C
Why the first actual parameter printing as a output in C

Time:10-26

#include <stdio.h> 

int add(int a, int b)
{
    if (a > b)
        return a * b;
}

int main(void)
{
    printf("%d", add(3, 7));
    return 0;
}

Output:

3

In the above code, I am calling the function inside the print. In the function, the if condition is not true, so it won't execute. Then why I am getting 3 as output? I tried changing the first parameter to some other value, but it's printing the same when the if condition is not satisfied.

CodePudding user response:

What happens here is called undefined behaviour.

When (a <= b), you don't return any value (and your compiler probably told you so). But if you use the return value of the function anyway, even if the function doesn't return anything, that value is garbage. In your case it is 3, but with another compiler or with other compiler flags it could be something else.

If your compiler didn't warn you, add the corresponding compiler flags. If your compiler is gcc or clang, use the -Wall compiler flags.

CodePudding user response:

Jabberwocky is right: this is undefined behavior. You should turn your compiler warnings on and listen to them.

However, I think it can still be interesting to see what the compiler was thinking. And we have a tool to do just that: Godbolt Compiler Explorer.

We can plug your C program into Godbolt and see what assembly instructions it outputs. Here's the direct Godbolt link, and here's the assembly that it produces.

add:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        jle     .L2
        mov     eax, DWORD PTR [rbp-4]
        imul    eax, DWORD PTR [rbp-8]
        jmp     .L1
.L2:
.L1:
        pop     rbp
        ret
.LC0:
        .string "%d"
main:
        push    rbp
        mov     rbp, rsp
        mov     esi, 7
        mov     edi, 3
        call    add
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        pop     rbp
        ret

Again, to be perfectly clear, what you've done is undefined behavior. With different compiler flags or a different compiler version or even just a compiler that happens to feel like doing things differently on a particular day, you will get different behavior. What I'm studying here is the assembly output by gcc 12.2 on Godbolt with optimizations disabled, and I am not representing this as standard or well-defined behavior.

This engine is using the System V AMD64 calling convention, common on Linux machines. In System V, the first two integer or pointer arguments are passed in the rdi and rsi registers, and integer values are returned in rax. Since everything we work with here is either an int or a char*, this is good enough for us. Note that the compiler seems to have been smart enough to figure out that it only needs edi, esi, and eax, the lower half-words of each of these registers, so I'll start using edi, esi, and eax from this point on.

Our main function works fine. It does everything we'd expect. Our two function calls are here.

mov     esi, 7
mov     edi, 3
call    add
mov     esi, eax
mov     edi, OFFSET FLAT:.LC0
mov     eax, 0
call    printf

To call add, we put 3 in the edi register and 7 in the esi register and then we make the call. We get the return value back from add in eax, and we move it to esi (since it will be the second argument to printf). We put the address of the static memory containing "%d" in edi (the first argument), and then we call printf. This is all normal. main knows that add was declared to return an integer, so it has the right to assume that, after calling add, there will be something useful in eax.

Now let's look at add.

add:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-8]
        jle     .L2
        mov     eax, DWORD PTR [rbp-4]
        imul    eax, DWORD PTR [rbp-8]
        jmp     .L1
.L2:
.L1:
        pop     rbp
        ret

The rbp and rsp shenanigans are standard function call fare and aren't specific to add. First, we load our two arguments onto the call stack as local variables. Now here's where the undefined behavior comes in. Remember that I said eax is the return value of our function. Whatever happens to be in eax when the function returns is the returned value.

We want to compare a and b. To do that, we need a to be in a register (lots of assembly instructions require their left-hand argument to be a register, while the right-hand can be a register, reference, immediate, or just about anything). So we load a into eax. Then we compare the value in eax to the value b on the call stack. If a > b, then the jle does nothing. We go down to the next two lines, which are the inside of your if statement. They correctly set eax and return a value.

However, if a <= b, then the jle instruction jumps to the end of the function without doing anything else to eax. Since the last thing in eax happened to be a (because we happened to use eax as our comparison register in cmp), that's what gets returned from our function.

But this really is just random. It's what the compiler happened to have put in that register previously. If I turn optimizations up (with -O3), then gcc inlines the whole function call and ends up printing out 0 rather than a. I don't know exactly what sequence of optimizations led to this conclusion, but since they started out by hinging on undefined behavior, the compiler is free to make what assumptions it chooses.

  •  Tags:  
  • c
  • Related