Bitshifting results have leading ones sometimes-CodePudding

While trying to read and write some unsinged long long int values to and from a file i encountered a problem when deserializing the values. The boiled down problem can be reproduced with the following code. Only sometimes bitshifting more than 32 bit result in value with leading ones. Why is that?

int main() {
    unsigned char* myBuffer = new unsigned char[16] {
        (unsigned char)0xb0,
        (unsigned char)0xf7,
        (unsigned char)0x80,
        (unsigned char)0x01,
        (unsigned char)0x00,
        (unsigned char)0x00,
        (unsigned char)0x00,
        (unsigned char)0x00,

        (unsigned char)0xf0,
        (unsigned char)0xc0,
        (unsigned char)0x49,
        (unsigned char)0x89,
        (unsigned char)0x29,
        (unsigned char)0x00,
        (unsigned char)0x00,
        (unsigned char)0x00
    };

    unsigned long long int firstValue = 0;
    unsigned long long int secondValue = 0;

    
    for (int i = 0; i < 8; i  ) {
        firstValue  |= myBuffer[i]     << (8 * i);
        secondValue |= myBuffer[i   8] << (8 * i);
        
        std::cout << "first buffer value "   << std::hex << (int)myBuffer[i]
                  << " second buffer value " << std::hex << (int)myBuffer[i   8]
                  << " first value "         << std::hex << firstValue
                  << " second value "        << secondValue << endl;
    }

    return 0;
}

Output

first buffer value b0 second buffer value f0 first value b0 second value f0
first buffer value f7 second buffer value c0 first value f7b0 second value c0f0
first buffer value 80 second buffer value 49 first value 80f7b0 second value 49c0f0
first buffer value 1 second buffer value 89 first value 180f7b0 second value ffffffff8949c0f0
first buffer value 0 second buffer value 29 first value 180f7b0 second value ffffffff8949c0f9
first buffer value 0 second buffer value 0 first value 180f7b0 second value ffffffff8949c0f9
first buffer value 0 second buffer value 0 first value 180f7b0 second value ffffffff8949c0f9
first buffer value 0 second buffer value 0 first value 180f7b0 second value ffffffff8949c0f9

Solution

I know how to fix this issue. By casting the unsigned char's to an unsigned long long int before bitshifting everything works well:

secondValue |= ((unsigned long long int)myBuffer[i   8]) << (8 * i);

I still just want to know why this is happening only sometimes.

CodePudding user response：

If you try this in a 32-bit environment:

unsigned char c = 0x80;
c <<= 24;
std::cout << (c << 24) << "\n";

you will get -2147483648. This is because the result of an arithmetic operation on integral types is int if both operands fit into an int, as they do here (see "Integral promotion" on this page). And int is signed.

So the result of the expression myBuffer[i 8] << (8 * i) is a signed 32-bit integer. And this will be negative after a 24-bit shift if myBuffer[i 8] has the top bit set.

Then when this negative 32-bit value gets added to long long secondValue, it is sign-extended to a 64-bit value, which sets all those bits (see "Integral conversions" on the same page -- note that the rules for integral conversion and the rules for integral promotion are not the same).

PS You don't need all those (unsigned char) casts in your array initialisation. This isn't Java!

CodePudding user response：

In the 4th iteration (i = 3) you have:

first buffer value 1 second buffer value 89 first value 180f7b0 second value ffffffff8949c0f0

This comes from the types used in the arithmetic operations:

secondValue |= myBuffer[i   8] << (8 * i);
    ULL        unsigned char 0x89  int 24

The first thing that happens is integral promotion:

secondValue |= myBuffer[i   8] << (8 * i);
    ULL         int 0x00000089 << int 24
                         int 0x89000000
                            -0x77000000

You shifted a bit into the MSB of the integer making it negative. Is this an integer overflow and UB? Look up the specifics for << on signed integers.

Next comes integer conversion to unsigned long long, which uses sign extension for two's-complement numbers:

secondValue |= myBuffer[i   8] << (8 * i);
    ULL               int -0x77000000
    ULL                       UUL
0x00000000000180f7b0 |= 0xFFFFFFFF89000000
         ULL  0xffffffff8949c0f0

There is just one fix: cast the char to an unsigned integral type of at least rank of int. You should also make i unsigned so there is no mix of signed and unsigned types. It's the only way to avoid conversion to an signed integral and unwanted sign extensions. And since we don't live in the last millenium use the exactlwidth-integral-types.

uint64_t firstValue = 0;
uint64_t secondValue = 0;

for (unsigned int i = 0; i < 8; i  ) {
    firstValue  |= ((uint64_t)myBuffer[i])     << (8 * i);
    secondValue |= ((uint64_t)myBuffer[i   8]) << (8 * i);