During my profiling with flamegraph, I found that callstacks are sometimes broken even when all codebases are compiled with the -fno-omit-frame-pointer
flag. By checking the binary generated by gcc, I noticed gcc may reorder x86 frame pointer saving/setup instructions (i.e., push %rbp; move %rsp, %rbp
), sometimes even after ret
instructions of some branches. As shown in the example below, push %rbp; move %rsp, %rbp
are put at the bottom of the function. It leads to incomplete and misleading callstacks when perf happens to sample instructions in the function before frame pointers are properly set.
C code:
int flextcp_fd_slookup(int fd, struct socket **ps)
{
struct socket *s;
if (fd >= MAXSOCK || fhs[fd].type != FH_SOCKET) {
errno = EBADF;
return -1;
}
uint32_t lock_val = 1;
s = fhs[fd].data.s;
asm volatile (
"1:\n"
"xchg %[locked], %[lv]\n"
"test %[lv], %[lv]\n"
"jz 3f\n"
"2:\n"
"pause\n"
"cmpl $0, %[locked]\n"
"jnz 2b\n"
"jmp 1b\n"
"3:\n"
: [locked] "=m" (s->sp_lock), [lv] "=q" (lock_val)
: "[lv]" (lock_val)
: "memory");
*ps = s;
return 0;
}
CMake Debug
Profile:
0000000000007c73 <flextcp_fd_slookup>:
7c73: f3 0f 1e fa endbr64
7c77: 55 push %rbp
7c78: 48 89 e5 mov %rsp,%rbp
7c7b: 48 83 ec 20 sub $0x20,%rsp
7c7f: 89 7d ec mov