Question> What is the recommended way to pass __int128_t as a function parameter?
Thank you
#include <iostream>
bool CheckInt(const __int128_t& large_number)
{
return large_number > 10000; // Just for Demo
}
bool CheckInt2(__int128_t large_number)
{
return large_number > 10000;
}
int main()
{
__int128_t abc = 20000;
std::cout<< CheckInt(abc) << std::endl;
std::cout<< CheckInt2(abc) << std::endl;
return 0;
}
CodePudding user response:
Let's look at four scenarios.
These were compiled by gcc for an 64 bit x86 architecture, there should be similar results for different compilers.
- How the functions are compiled:
bool by_value(__int128 large_number) {
return large_number > 10000;
}
bool by_reference(const __int128& large_number) {
return large_number > 10000;
}
And we can see the x86 assembler output here https://godbolt.org/z/v9cM8xj35
by_value(__int128):
mov eax, 10000
cmp rax, rdi # Use first 8 bytes
mov eax, 0
sbb rax, rsi # Use second 8 bytes
setl al
ret
by_reference(__int128 const&):
mov eax, 10000
cmp rax, QWORD PTR [rdi] # Use first 8 bytes
mov eax, 0
sbb rax, QWORD PTR [rdi 8] # Use second 8 bytes
setl al
ret
The commented lines are the only lines that differ.
This is showing the calling convention of the platform: The first 8 bytes of arguments are stored in rdi
, the second 8 bytes in rsi
.
When you pass by value, large_number
will be stored in these two registers, and can be used quickly and efficiently.
When you pass by reference, only one register is used to pass a pointer to the value (rdi
), and to access the first 8 bytes the dereference QWORD PTR [rdi]
is used, and the second 8 bytes with QWORD PTR [rdi 8]
(some pointer arithmetic).
Passing by value will win out in most situations here. If you have a lot of arguments or local variables in your functions, the registers used to store large_number
may "spill" onto the stack, so theoretically passing by value would need to do more work. But it would probably spill if there was a one-register pointer or a two-register 16-byte value, so there shouldn't be much difference in practice.
- Calling the function with an existing
__int128
variable:
bool by_value(__int128);
bool by_reference(const __int128&);
extern __int128 x;
extern bool call_by_value() {
return by_value(x);
}
extern bool call_by_reference() {
return by_reference(x);
}
https://godbolt.org/z/7sT8b33Ez
call_by_value():
mov rdi, QWORD PTR x[rip]
mov rsi, QWORD PTR x[rip 8]
jmp by_value(__int128)
call_by_reference():
mov edi, OFFSET FLAT:x
jmp by_reference(__int128 const&)
It may look like more work needs to be done in the by-value case: To call by-reference, you only need to the address of x
(OFFSET FLAT:x
) into edi
and call the function, whereas in the by-value case the value of x
needs to be read into the two registers then the function can be called.
However, recall that by_reference
will have to indirect through the pointer to use it. So the by reference is hiding the x[rip]
and x[rip 8]
inside the function, and there isn't much difference.
- Calling the function with some constant value (or something that optimizes to it):
bool call_by_value() {
__int128 abc = 20000;
return by_value(abc);
}
bool call_by_reference() {
__int128 abc = 20000;
return by_reference(abc);
}
https://godbolt.org/z/6jhEWfh6a
call_by_value():
mov edi, 20000 # Stores 2000 into the first register
xor esi, esi # Stores 0 into the second register
jmp by_value(__int128)
call_by_reference():
sub rsp, 24
mov rdi, rsp # Store current stack pointer (which will point to abc)
mov QWORD PTR [rsp], 20000 # Store first 8 bytes on stack
mov QWORD PTR [rsp 8], 0 # Store second 8 bytes on the stack
call by_reference(__int128 const&)
add rsp, 24
ret
Calling by reference needs to do a lot: The value has to be allocated onto the stack and then a pointer to it is passed to the function.
Calling by value can just stores the value into the two registers and calls the function.
- Calling the function with a runtime calculated prvalue (here the "calculation" is just a copy)
bool call_by_value() {
return by_value( x);
}
bool call_by_reference() {
return by_reference( x);
}
https://godbolt.org/z/vqdGEeGY9
call_by_value():
mov rdi, QWORD PTR x[rip]
mov rsi, QWORD PTR x[rip 8]
jmp by_value(__int128)
call_by_reference():
sub rsp, 24
movdqa xmm0, XMMWORD PTR x[rip] # Store the value of x into a 16 byte register
mov rdi, rsp # Store current stack pointer
movaps XMMWORD PTR [rsp], xmm0 # Write 16 bytes to the stack pointer
call by_reference(__int128 const&)
add rsp, 24
ret
So to pass the result of a calculation, in the by-value case the calculation can directly be done on registers. In the by-reference case, the value needs to be calculated and then stored on to the stack and then a pointer needs to be passed.
There is one more issue: When you have extern bool by_reference(const __int128&);
, and you don't have whole program optimisation or link time optimization, the compiler can't know that passing to by_reference
doesn't modify the value it is passed. After all, it could look like:
bool by_reference(const __int128& large_number) {
const_cast<__int128&>(large_number) = 0;
}
This can disable some further optimizations.
All in all, it is better in most cases to pass by value. On other architectures, the default calling convention may be to pass 16 byte arguments on the stack, which would make both cases not too different.
Some people will say that you should only pass something the size of a pointer or smaller by value, and everything else should be passed by reference. However, this fails to account for how much faster registers are than the stack.
This was based on the analysis of the assembler, not on actual timings. You would probably have to call a function many, many times for this to make a difference.