Adding int type to uint64

I have a question regarding conversion of integers:

#include <iostream>
#include <cstdint>
using namespace std;

int main()
{
    
    int N,R,W,H,D;
    uint64_t sum = 0;
    uint64_t sum_2 = 0;
    cin >> W >> H >> D;
    sum  = static_cast<uint64_t>(W) * H * D * 100;
    sum_2  = W * H * D * 100;
    
    cout << sum << endl;
    cout << sum_2 << endl;
    return 0;
}

I thought, that sum should be equal to sum_2, because uint64_t type is bigger than int type and during arithmetic operations compiler chooses bigger type(which is uint64_t). So by my understanding, sum_2 must have uint64_t type. But it has int type.

Can you explain my why sum_2 was converted to int? Why didn't it stay uint64_t?

CodePudding user response：

Undefined behavior signed-integer overflow/underflow, and well-defined behavior unsigned-integer overflow/underflow, in C and C

If I enter 200, 300, and 358 for W, H, and D, I get the following output, which makes perfect sense for my gcc compiler on a 64-bit Linux machine:

2148000000
18446744071562584320

Why does this make perfect sense?

Well, the default type is int, which is int32_t for the gcc compiler on a 64-bit Linux machine, and its max value is 2^32/2-1 = 2147483647, and its min value is -2147483648. The line sum_2 = W * H * D * 100; does int arithmetic since that's the type of each variable there, 100 included, and no explicit cast is used. So, after doing int arithmetic, it then implicitly casts the int result into a uint64_t as it stores the result into the uint64_t sum_2 variable. The int arithmetic on the right-hand side prior to that point, however, results in 2148000000, which has undefined behavior signed integer overflow over the top of the max int value and back down to the min int value and up again.

Even though according to the C and C standards, signed integer overflow or underflow is undefined behavior, in the gcc compiler, I know that signed integer overflow happens to roll over to negative values. 2148000000 - 2147483647 = 516353 rollover counts. The first count up rolls over to the min int32_t value of -2147483648, and the next (516353 - 1 = 516352) counts go up to -2147483648 516352 = -2146967296. So, the result of W * H * D * 100 for the inputs above is now -2146967296, based on undefined behavior. Next, that value is implicitly cast from an int (int32_t in this case) to a uint64_t in order to store it from an int (int32_t in this case) into the uint64_t sum_2 variable, resulting in well-defined behavior unsigned integer underflow. You start with -2146967296. The first down-count underflows down to uint64_t max, which is 2^64-1 = 18446744073709551615. Now subtract the remaining 2146967296 - 1 = 2146967295 counts from that and you get 18446744073709551615 - 2146967295 = 18446744071562584320, just as shown above!

Voila! With a little compiler and hardware architecture understanding, and some expected but undefined behavior, the result is perfectly explainable and makes sense!

To easily see the negative value, add this to your code:

int sum_3 = W*H*D*100;
cout << sum_3 << endl;  // output: -2146967296

Here is my total code I used for some quick checks to write this answer. I ran it with the gcc/g compiler on a 64-bit Linux machine.

If you run this on an 8-bit microcontroller such as an Arduino Uno, you'd get different results since an int is a 2-byte int16_t by default, instead! But, now that you understand the principles, you could figure out the expected result. (Also, I think 64-bit values don't exist on that architecture, so they become 32-bit values).

#include <iostream>
#include <cstdint>
using namespace std;

int main()
{
    
    int N,R,W,H,D;
    uint64_t sum = 0;
    uint64_t sum_2 = 0;
    // cin >> W >> H >> D;
    W = 200;
    H = 300;
    D = 358;
    sum  = static_cast<uint64_t>(W) * H * D * 100;
    sum_2  = W * H * D * 100;
    
    cout << sum << endl;
    cout << sum_2 << endl;
    
    int sum_3 = W*H*D*100;
    cout << sum_3 << endl;
    
    sum_2 = -1; // underflow to uint64_t max
    cout << sum_2 << endl;
    
    sum_2 = 18446744073709551615ULL - 2146967295;
    cout << sum_2 << endl;
    
    return 0;
}

Last note: never intentionally leave undefined behavior in your code. That is known as a bug. You do not have to write ISO C , however! If you can find compiler documentation indicating a certain behavior is well-defined, that's ok, so long as you know you are writing in the g language and not the C language, and don't expect your code to work the same across compilers. Here is an example where I do that: Using Unions for "type punning" is fine in C, and fine in gcc's C as well (as a gcc [g ] extension). I'm generally okay with relying on compiler extensions like this. Just be aware of what you're doing is all.

CodePudding user response：

Just a short version of @Gabriel Staples good answer.

"and during arithmetic operations compiler chooses bigger type(which is uint64_t)"

There is no uin64_t in W * H * D * 100, just four int. After this multiplication, the int product (which overflowed and is UB) is assigned to an uint64_t.

Instead, use 100LLU * W * H * D to perform a wider unsigned multiplication.