Why do we need std::numeric_limits::max

I understand that the floating points are represented in memory using sign, exponent and mantissa form which have limited number of bits to represent each part and hence this leads to rounding errors. Essentially, lets say if i have a floating point number, then due to certain number of bits it basically gets mapped to one of the nearest representable form using he rounding strategy.

Does this mean that 2 different floating points can get mapped to same memory representation? If yes, then how can i avoid it programmatically?

I came across this std::numeric_limits<T>::max_digits10

It says the minimum number of digits needed in a floating point number to survive a round trip from float to text to float.

Where does this round trip happens in a c program i write. As far as i understand, i have a float f1 which is stored in memory (probably with rounding error) and is read back. I can directly have another float variable f2 in c program and then can compare it with original floating point f1. Now my question is when will i need std::numeric_limits::max_digits10 in this use case? Is there any use case which explains that i need to use std::numeric_limits::max_digits10 to ensure that i don't do things wrong.

Can anyone explain the above scenarios?

CodePudding user response：

You seem to be confusing two sources of rounding (and precision loss) with floating point numbers.

The first one is due to the way floating point numbers are represented in memory, which uses binary numbers for the mantissa and exponent, as you just pointed. The classic example being :

const float a = 0.1f;
const float b = 0.2f;
const float c = a b;

printf("%.8f   %.8f = %.8f\n",a,b,c);

which will print

0.10000000   0.20000000 = 0.30000001

There, the mathematically correct result is 0.3, but 0.3 is not representable with the binary representation. Instead you get the closest number which can be represented.

The other one, which is where max_digits10 comes into play, is for text representation of floating point number, for example, when you do printf or write to a file.

When you do this using the %f format specifier you get the number printed out in decimal.

When you print the number in decimal you may decide how many digits get printed out. In some cases you might not get an exact printout of the actual number.

For example, consider

const float x = 10.0000095;
const float y = 10.0000105;
printf("x = %f ; y = %f\n", x,y);

this will print

x = 10.000010 ; y = 10.000010

on the other hand, increasing the precision of printf to 8 digits with %.8f will give you.

 x = 10.00000954 ; y = 10.00001049

So if you wanted to save these two float values as text to a file using fprintf or ofstream with the default number of digits, you may have saved the same value twice where you originally had two different values for x and y.

max_digits10 is the answer to the question "how many decimal digits do I need to write in order to avoid this situation for all possible values ?". In other words, if you write your float with max_digits10 digits (which happens to be 9 for floats) and load it back, you're guaranteed to get the same value you started with.

Note that the decimal value written may be different from the floating point number's actual value (due to the different representation. But it is still guaranteed than when you read the text of the decimal number into a float you will get the same value.

CodePudding user response：

Forget about the exact representation for a minute, and pretend you have a two bit float. Bit 0 is 1/2, and bit 1 is 1/4. Let's say you want to transform this number into a string, such that when the string is parsed, it yields the original number.

Your possible numbers are 0, 1/4, 1/2, 3/4. Clearly you can represent all of them with two digits past the decimal point and get the same number back, since the representation is exact in this case. But can you get away with a single digit?

Assuming half always rounds up, the numbers map to 0, 0.3, 0.5, 0.8. The first and third numbers are exact while the second and fourth are not. So what happens when you try to parse them back?

0.3 - 0.25 < 0.5 - 0.3, and 0.8 - 0.75 < 1 - 0.8. So clearly in both cases the rounding works out. That means you only need one digit past the decimal point to capture the value of our contrived two-bit floats.

You can expand the number of bits from two to 53 (for a double), and add an exponent to alter the scale of the number, but the concept is exactly the same.

CodePudding user response：

Where does this round trip happens in a c program i write.

That depends on the code you write, but an obvious place would be... any floating-point literal you put in your code:

float f = 10.34529848505433;

Will f be exactly that number? No. It will be an approximation of that number because most implementations of float can't store that much precision. If you changed the literal to 10.34529848505432, odds are good f will have the same value.

This is not about round-tripping per-se. The standard defines max_digits10 purely in terms of going from decimal to float:

Number of base 10 digits required to ensure that values which differ are always differentiated.