When joining four 1-byte vars into one 4-byte word, which is a faster way to shift and OR ? (compari-CodePudding

So I'm currently studying bit-wise operators and bit-manipulation, and I have come across two different ways to combine four 1-byte words into one 4-byte wide word.

the two ways are given below

After finding out this two methods I compare the disassembly code generated by the two (compiled using gcc 11 with -O2 flag), I don't have the basic knowledge with disassembly and with the code it generates, and what I only know is the shorter the code, the faster the function is (most of the time I guess... maybe there are some exceptions), now for the two methods it seems that they have the same numbers/counts of lines in the generated disassembly code, so I guess they have the same performance?

I also got curious about the order of the instructions, the first method seems to alternate other instructions sal>or>sal>or>sal>or, while the second one is more uniform sal>sal>sal>or>or>mov>or does this have some significant effect in the performance say for example if we are dealing with a larger word?

Two methods

int method1(unsigned char byte4, unsigned char byte3, unsigned char byte2, unsigned char byte1)
{
    int combine = 0;
    combine = byte4;
    combine <<=8;
    combine |= byte3;
    combine <<=8;
    combine |= byte2;
    combine <<=8;
    combine |= byte1;
    return combine;
}

int method2(unsigned char byte4, unsigned char byte3, unsigned char byte2, unsigned char byte1)
{
    int combine = 0, temp;
    temp = byte4;
    temp <<= 24;
    combine |= temp;
    temp = byte3;
    temp <<= 16;
    combine |= temp;
    temp = byte2;
    temp <<= 8;
    combine |= temp;
    temp = byte1;
    combine |= temp;
    return combine;
}

Disassembly

// method1(unsigned char, unsigned char, unsigned char, unsigned char):
        movzx   edi, dil
        movzx   esi, sil
        movzx   edx, dl
        movzx   eax, cl
        sal     edi, 8
        or      esi, edi
        sal     esi, 8
        or      edx, esi
        sal     edx, 8
        or      eax, edx
        ret
// method2(unsigned char, unsigned char, unsigned char, unsigned char):
        movzx   edx, dl
        movzx   ecx, cl
        movzx   esi, sil
        sal     edi, 24
        sal     edx, 8
        sal     esi, 16
        or      edx, ecx
        or      edx, esi
        mov     eax, edx
        or      eax, edi
        ret

This might be "premature optimization", but I just want to know if there is a difference.

CodePudding user response：

I completely agree with Émerick Poulin:

So the real answer, would be to benchmark it on your target machine to see which one is faster.

Nevertheless, I created a "method3()", and disassembled all three with gcc version 10.3.0, with -O0 and -O2, Here's a summary of the -O2 results:

Method3:

int method3(unsigned char byte4, unsigned char byte3, unsigned char byte2, unsigned char byte1)
{
    int combine = (byte4 << 24)|(byte3<<16)|(byte2<<8)|byte1;
    return combine;
}

gcc -O2 -S:

;method1:
    sall    $8,