I have a question regarding conversion of integers:
#include <iostream>
#include <cstdint>
using namespace std;
int main()
{
int N,R,W,H,D;
uint64_t sum = 0;
uint64_t sum_2 = 0;
cin >> W >> H >> D;
sum = static_cast<uint64_t>(W) * H * D * 100;
sum_2 = W * H * D * 100;
cout << sum << endl;
cout << sum_2 << endl;
return 0;
}
I thought, that sum
should be equal to sum_2
, because uint64_t
type is bigger than int
type and during arithmetic operations compiler chooses bigger type(which is uint64_t). So by my understanding, sum_2
must have uint64_t
type. But it has int type.
Can you explain my why sum_2 was converted to int
? Why didn't it stay uint64_t
?
CodePudding user response:
Undefined behavior signed-integer overflow/underflow, and well-defined behavior unsigned-integer overflow/underflow, in C and C
If I enter 200
, 300
, and 358
for W
, H
, and D
, I get the following output, which makes perfect sense for my gcc compiler on a 64-bit Linux machine:
2148000000
18446744071562584320
Why does this make perfect sense?
Well, the default type is int
, which is int32_t
for the gcc compiler on a 64-bit Linux machine, and its max value is 2^32/2-1 = 2147483647, and its min value is -2147483648. The line sum_2 = W * H * D * 100;
does int
arithmetic since that's the type of each variable there, 100
included, and no explicit cast is used. So, after doing int
arithmetic, it then implicitly casts the int
result into a uint64_t
as it stores the result into the uint64_t sum_2
variable. The int
arithmetic on the right-hand side prior to that point, however, results in 2148000000
, which has undefined behavior signed integer overflow over the top of the max int
value and back down to the min int
value and up again.
Even though according to the C and C standards, signed integer overflow or underflow is undefined behavior, in the gcc
compiler, I know that signed integer overflow happens to roll over to negative values. 2148000000 - 2147483647 = 516353 rollover counts. The first count up rolls over to the min int32_t
value of -2147483648, and the next (516353 - 1 = 516352) counts go up to -2147483648 516352 = -2146967296. So, the result of W * H * D * 100
for the inputs above is now -2146967296
, based on undefined behavior. Next, that value is implicitly cast from an int
(int32_t
in this case) to a uint64_t
in order to store it from an int
(int32_t
in this case) into the uint64_t sum_2
variable, resulting in well-defined behavior unsigned integer underflow. You start with -2146967296. The first down-count underflows down to uint64_t
max, which is 2^64-1 = 18446744073709551615. Now subtract the remaining 2146967296 - 1 = 2146967295 counts from that and you get 18446744073709551615 - 2146967295 = 18446744071562584320
, just as shown above!
Voila! With a little compiler and hardware architecture understanding, and some expected but undefined behavior, the result is perfectly explainable and makes sense!
To easily see the negative value, add this to your code:
int sum_3 = W*H*D*100;
cout << sum_3 << endl; // output: -2146967296
Here is my total code I used for some quick checks to write this answer. I ran it with the gcc/g compiler on a 64-bit Linux machine.
If you run this on an 8-bit microcontroller such as an Arduino Uno, you'd get different results since an int
is a 2-byte int16_t
by default, instead! But, now that you understand the principles, you could figure out the expected result. (Also, I think 64-bit values don't exist on that architecture, so they become 32-bit values).
#include <iostream>
#include <cstdint>
using namespace std;
int main()
{
int N,R,W,H,D;
uint64_t sum = 0;
uint64_t sum_2 = 0;
// cin >> W >> H >> D;
W = 200;
H = 300;
D = 358;
sum = static_cast<uint64_t>(W) * H * D * 100;
sum_2 = W * H * D * 100;
cout << sum << endl;
cout << sum_2 << endl;
int sum_3 = W*H*D*100;
cout << sum_3 << endl;
sum_2 = -1; // underflow to uint64_t max
cout << sum_2 << endl;
sum_2 = 18446744073709551615ULL - 2146967295;
cout << sum_2 << endl;
return 0;
}
Last note: never intentionally leave undefined behavior in your code. That is known as a bug. You do not have to write ISO C , however! If you can find compiler documentation indicating a certain behavior is well-defined, that's ok, so long as you know you are writing in the g language and not the C language, and don't expect your code to work the same across compilers. Here is an example where I do that: Using Unions for "type punning" is fine in C, and fine in gcc's C as well (as a gcc [g ] extension). I'm generally okay with relying on compiler extensions like this. Just be aware of what you're doing is all.
CodePudding user response:
Just a short version of @Gabriel Staples good answer.
"and during arithmetic operations compiler chooses bigger type(which is uint64_t)"
There is no uin64_t
in W * H * D * 100
, just four int
. After this multiplication, the int
product (which overflowed and is UB) is assigned to an uint64_t
.
Instead, use 100LLU * W * H * D
to perform a wider unsigned multiplication.