I was trying to understand performance of global static variable and came across a very weird scenario. The code below takes about 525ms average.
static unsigned long long s_Data = 1;
int main()
{
unsigned long long x = 0;
for (int i = 0; i < 1'000'000'000; i )
{
x = i s_Data;
}
return 0;
}
and this code below takes 1050ms average.
static unsigned long long s_Data = 1;
int main()
{
unsigned long long x = 0;
for (int i = 0; i < 1'000'000'000; i )
{
x = i;
}
return 0;
}
I am aware that accessing static variables are fast, and writing to them is slow based on my other tests but I am not sure what piece of information I am missing out in the above scenario. Note: compiler optimizations were turned off and MSVC compiler was used to perform the tests.
CodePudding user response:
To address the actual question, with optimizations turned off, we can turn to the generated assembly to get an idea on why one runs more quickly than the other.
In the first test, GCC (trunk) https://godbolt.org/z/GdssT9vME produces this assembly
s_Data:
.quad 1
main:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], 0
mov DWORD PTR [rbp-12], 0
jmp .L2
.L3:
mov eax, DWORD PTR [rbp-12]
movsx rdx, eax
mov rax, QWORD PTR s_Data[rip]
add rax, rdx
add QWORD PTR [rbp-8], rax
add DWORD PTR [rbp-12], 1
.L2:
cmp DWORD PTR [rbp-12], 999999999
jle .L3
mov eax, 0
pop rbp
ret
The second test https://godbolt.org/z/5ndnEv5Ts we get
main:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], 0
mov DWORD PTR [rbp-12], 0
jmp .L2
.L3:
mov eax, DWORD PTR [rbp-12]
cdqe
add QWORD PTR [rbp-8], rax
add DWORD PTR [rbp-12], 1
.L2:
cmp DWORD PTR [rbp-12], 999999999
jle .L3
mov eax, 0
pop rbp
ret
Comparing these two programs, the first is sixteen instructions, while the second is only fourteen instructions. (I'm sure you can guess that different instructions also have different cpu cycle overheads)
How many CPU cycles are needed for each assembly instruction?
As noted in my comment, optimizations vastly change the generated assembly.
Both tests produce this with -O2
main:
xor eax, eax
ret