So I'm currently studying bit-wise operators and bit-manipulation, and I have come across two different ways to combine four 1-byte words into one 4-byte wide word.
the two ways are given below
After finding out this two methods I compare the disassembly code generated by the two (compiled using gcc 11 with -O2 flag), I don't have the basic knowledge with disassembly and with the code it generates, and what I only know is the shorter the code, the faster the function is (most of the time I guess... maybe there are some exceptions), now for the two methods it seems that they have the same numbers/counts of lines in the generated disassembly code, so I guess they have the same performance?
I also got curious about the order of the instructions, the first method seems to alternate other instructions sal>or>sal>or>sal>or
, while the second one is more uniform sal>sal>sal>or>or>mov>or
does this have some significant effect in the performance say for example if we are dealing with a larger word?
Two methods
int method1(unsigned char byte4, unsigned char byte3, unsigned char byte2, unsigned char byte1)
{
int combine = 0;
combine = byte4;
combine <<=8;
combine |= byte3;
combine <<=8;
combine |= byte2;
combine <<=8;
combine |= byte1;
return combine;
}
int method2(unsigned char byte4, unsigned char byte3, unsigned char byte2, unsigned char byte1)
{
int combine = 0, temp;
temp = byte4;
temp <<= 24;
combine |= temp;
temp = byte3;
temp <<= 16;
combine |= temp;
temp = byte2;
temp <<= 8;
combine |= temp;
temp = byte1;
combine |= temp;
return combine;
}
Disassembly
// method1(unsigned char, unsigned char, unsigned char, unsigned char):
movzx edi, dil
movzx esi, sil
movzx edx, dl
movzx eax, cl
sal edi, 8
or esi, edi
sal esi, 8
or edx, esi
sal edx, 8
or eax, edx
ret
// method2(unsigned char, unsigned char, unsigned char, unsigned char):
movzx edx, dl
movzx ecx, cl
movzx esi, sil
sal edi, 24
sal edx, 8
sal esi, 16
or edx, ecx
or edx, esi
mov eax, edx
or eax, edi
ret
This might be "premature optimization", but I just want to know if there is a difference.
CodePudding user response:
Without optimizations, method 2 seems to be a tiny bit faster.
This is an online benchmark of the code provided: https://quick-bench.com/q/eyiiXkxYVyoogefHZZMH_IoeJss
However, it is difficult to get accurate metrics from tiny operations like this. It also depends on the speed of the instructions on a given CPU (different instructions can take more or less clock cycles).
Furthermore, bitwise operators tend to be pretty fast since they are "basic" instructions.
So the real answer, would be to benchmark it on your target machine to see which one is faster.
CodePudding user response:
I completely agree with Émerick Poulin:
So the real answer, would be to benchmark it on your target machine to see which one is faster.
Nevertheless, I created a "method3()", and disassembled all three with gcc version 10.3.0, with -O0 and -O2, Here's a summary of the -O2 results:
Method3:
int method3(unsigned char byte4, unsigned char byte3, unsigned char byte2, unsigned char byte1)
{
int combine = (byte4 << 24)|(byte3<<16)|(byte2<<8)|byte1;
return combine;
}
gcc -O2 -S:
;method1:
sall $8,