Why adding one integer to a function does not cut recursion stack depth in half-CodePudding

Edit1: As advised by some comments, I printed the address of variable.

Edit2: As advised by some comments, I added a bit manipulation so that compilers cannot simply throw my variable away

Edit3: As advised by one answer, I revised the way a variable's is printed.

Edit4: Added quite a bit of non-sense to make life difficult for gcc

Edit5: Add a snippet three to test @4386427's theory--the result looks to support his thought that a compiler may reserve 32 Bytes by default. Therefore, we may need to define at least 5 variable to see the difference.

I have some basic understanding of stack memory and heap memory. Take C as an example, if I define a local variable in a function, it occupies stack memory; if I define a pointer and allocate a few memory blocks to it, these memory blocks occupy heap memory. If a function calls itself recursively, the stack will be full and overflow will occur. So I did a simple test and the only difference between snippets one and two is that snippet two has one more integer defined:

snippet one:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int function(int depth) {
  int tmp = rand() % 65536;
  tmp = tmp - 1;
  printf("val: %d; addr: %p; depth: %d\n", tmp, (void*)&tmp, depth);
  tmp = function(  depth)   1;
  return tmp;
}
int main() {
  srand(time(NULL));
  int res = function(0);
  printf("%d\n", res);
  return 0;
}

output one:

...
val: 57227; addr: 0x7fff00dff78c; depth: 174626
val: 8288; addr: 0x7fff00dff75c; depth: 174627
val: 24194; addr: 0x7fff00dff72c; depth: 174628
Segmentation fault

snippet two:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int function(int depth) {
  int tmp0 = rand() % 65536;
  int tmp1 = rand() % 65536;
  tmp0 = tmp0 - 1;
  printf("val: %d, %d; addr: %p, %p; depth: %d\n", tmp0, tmp1, (void*)&tmp0, (void*)&tmp1, depth);
  tmp1 = function(  depth);
  return tmp1 - tmp0;
}
int main() {
  srand(time(NULL));
  int res = function(0);
  printf("%d\n", res);
  return 0;
}

output two:

...
val: 40745, 32446; addr: 0x7ffcb80b079c, 0x7ffcb80b0798; depth: 174528
val: 34014, 57470; addr: 0x7ffcb80b076c, 0x7ffcb80b0768; depth: 174529
val: 56801, 34478; addr: 0x7ffcb80b073c, 0x7ffcb80b0738; depth: 174530
Segmentation fault

I compiled both code using gcc and as expected both of them cause stack overflow. However, what I initially expected is that the depth of snippet two will be much shallower given that function in snippet two uses 2x memory. However, while snippet two does segfault a bit earlier, the depth of two stacks are actually very close...

If everything works as my naive theory, function in snippet one calls itself 174,616 times, it needs to occupy 4 Bytes * 174,616 / 1,024 = 682 KBytes; function in snippet two calls itself 174,539 times, it need to occupy (4 4) Bytes * 174,539 = 1,363 KBytes.

So why is it like this?

snippet three

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int function(int depth) {
  int tmp0 = rand() % 65536;
  int tmp1 = rand() % 65536;
  int tmp2 = rand() % 65536;
  int tmp3 = rand() % 65536;
  int tmp4 = rand() % 65536;
  int tmp5 = rand() % 65536;
  tmp0 = tmp0 - 1;
  tmp1 = tmp1   1;
  tmp2 = tmp2 - 2;
  tmp3 = tmp3   2;
  tmp4 = tmp4 - 3;
  tmp5 = tmp5   3;
  printf("val: %d, %d, %d; addr: %p, %p, %p; depth: %d\n", tmp0, tmp1, tmp2, (void*)&tmp0, (void*)&tmp1, (void*)&tmp2, depth);
  tmp1 = function(  depth);
  return tmp0 - tmp1   tmp2 - tmp3   tmp4;
}
int main() {
  srand(time(NULL));
  long res = function(0);
  printf("%d\n", res);
  return 0;
}

output three

val: 9366, 56113, 48970; addr: 0x7fff063fe830, 0x7fff063fe82c, 0x7fff063fe828; depth: 130920
val: 11924, 11633, 26004; addr: 0x7fff063fe7f0, 0x7fff063fe7ec, 0x7fff063fe7e8; depth: 130921
val: 13316, 42397, 45027; addr: 0x7fff063fe7b0, 0x7fff063fe7ac, 0x7fff063fe7a8; depth: 130922
val: 4285, 58053, 21693; addr: 0x7fff063fe770, 0x7fff063fe76c, 0x7fff063fe768; depth: 130923
Segmentation fault

CodePudding user response：

Keep in mind that local variables aren't the only thing stored on the stack. There are also parameters and the calling function's return address. So you're more likely looking at a 16 byte vs. 20 byte difference at a minimum rather than 4 vs. 8.

If you look closely at the addresses printed between each iteration, you'll see that they differ by 48 bytes in both cases, so given than it makes sense the stack frame tops out at around the same time.

So it seems the compiler is inserting some padding as well which ends up being taken up by the extra variable in the latter case.

CodePudding user response：

Another smaller expriment:

#include <stdio.h>
void function(int depth) {
  if (depth==0) return;
  int tmp = 65536;
  printf("%p\n",&tmp);
  function(--depth);
}
void function2(int depth) {
  if (depth==0) return;
  int tmp = 65536;
  int tmp2 = 65536;
  printf("%p %p\n",&tmp,&tmp2);
  function2(--depth);
}
int main() {
  function(2);
  function2(2);
  return 0;
}

can produce the following:

0x7ffee80f0988
0x7ffee80f0968
0x7ffee80f0988 0x7ffee80f0984
0x7ffee80f0968 0x7ffee80f0964

where one can observe that distance between two local variables each in its own invocation is the same for both functions. That probably means that in these cases stack allocation is of constant size and of the same size in both. If you add more local variables in the second case (4/5) that will change. Stack frame is probably allocated in a size rounded to some value.

CodePudding user response：

So why is it like this?

Well, what you describe is in principle correct but... You are forgetting compiler optimization. The compiler is allowed to do all kind of optimizations as long as it doesn't change the observable behavior of your program.

For instance the compiler could decide to keep all your tmp variables in cpu registers. In that case there would be no stack memory assigned to them.

You can try to force them to memory by printing their address like:

printf("%p\n", (void*)&tmp);

Another thing that may happen is that the compiler changes the stack pointer with the same amount for both functions. On my system the compiler reserves 32 bytes in both cases. I had to use 5 tmp variables before the compiler change it to 48. In other words - don't expect that the function reserves exactly what it needs. It is allowed to reserve more. On my system it seems that it always change the stack pointer by N x 16.

To get a better understanding, you need to look at the generated machine code. For that you can use gcc -S

You can also check this https://godbolt.org/z/x5c3bj7M9

Both function starts with:

    push    rbp
    mov     rbp, rsp
    sub     rsp, 32   <------ Stack pointer change, i.e. reserving memory
    mov     DWORD PTR [rbp-20], edi

so they use the same amount of stack for both calls.

Same code compiled with -O2 gives a completely different result https://godbolt.org/z/f6jMEqaPd

The first function gives:

    push    rbx
    mov     ebx, edi
    sub     rsp, 48   <------ Stack pointer change, i.e. reserving memory

but the second gives:

    push    rbx
    mov     ebx, edi
    sub     rsp, 16   <------ Stack pointer change, i.e. reserving memory

The conclusion is: You can't just look at the C code and figure out what will happen. You need the machine code.