Matlab "single" precision vs C floating point?-CodePudding

My Matlab script reads a string value "0.001044397222448" from a file, and after parsing the file, this value printed in the console shows as double precision:

  value_double =
      0.001044397222448

After I convert this number to singe using value_float = single(value_double), the value shows as:

value_float = 
  0.0010444

What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?

My problem is that later on, after I compare this with analogous C code, I get differences. In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000. So the C code keeps good precision. But, does Matlab?

CodePudding user response：

The number 0.001044397222448 (like the vast majority of decimal fractions) cannot be exactly represented in binary floating point.

As a single-precision float, it's most closely represented as (hex) 0x0.88e428 × 2^-9, which in decimal is 0.001044397242367267608642578125.

In double precision, it's most closely represented as 0x0.88e427d4327300 × 2^-9, which in decimal is 0.001044397222447999984407118745366460643708705902099609375.

Those are what the numbers are, internally, in both C and Matlab.

Everything else you see is an artifact of how the numbers are printed back out, possibly rounded and/or truncated.

When I said that the single-precision representation "in decimal is 0.001044397242367267608642578125", that's mildly misleading, because it makes it look like there are 28 or more digits' worth of precision. Most of those digits, however, are an artifact of the conversion from base 2 back to base 10. As other answers have noted, single-precision floating point actually gives you only about 7 decimal digits of precision, as you can see if you notice where the single- and double-precision equivalents start to diverge:

0.001044397242367267608642578125
0.001044397222447999984407118745366460643708705902099609375
            ^
        difference

Similarly, double precision gives you roughly 16 decimal digits worth of precision, as you can see if you compare the results of converting a few previous and next mantissa values:

0x0.88e427d43272f8  0.00104439722244799976756668424826557384221814572811126708984375
0x0.88e427d4327300  0.001044397222447999984407118745366460643708705902099609375
0x0.88e427d4327308  0.00104439722244800020124755324246734744519926607608795166015625
0x0.88e427d4327310  0.0010443972224480004180879877395682342466898262500762939453125
                                        ^
                                     changes

This also demonstrates why you can never exactly represent your original value 0.001044397222448 in binary. If you're using double, you can have 0.00104439722244799998, or you can have 0.0010443972224480002, but you can't have anything in between. (You'd get a little less close with float, and you could get considerably closer with long double, but you'll never get your exact value.)

In C, and whether you're using float or double, you can ask for as little or as much precision as you want when printing things with %f, and under a high-quality implementation you'll always get properly-rounded results. (Of course the results you get will always be the result of rounding the actual, internal value, not necessarily the decimal value you started with.) For example, if I run this code:

printf("%.5f\n", 0.001044397222448);
printf("%.10f\n", 0.001044397222448);
printf("%.15f\n", 0.001044397222448);
printf("%.20f\n", 0.001044397222448);
printf("%.30f\n", 0.001044397222448);
printf("%.40f\n", 0.001044397222448);
printf("%.50f\n", 0.001044397222448);
printf("%.60f\n", 0.001044397222448);
printf("%.70f\n", 0.001044397222448);

I see these results, which as you can see match the analysis above. (Note that this particular example is using double, not float.)

0.00104
0.0010443972
0.001044397222448
0.00104439722244799998
0.001044397222447999984407118745
0.0010443972224479999844071187453664606437
0.00104439722244799998440711874536646064370870590210
0.001044397222447999984407118745366460643708705902099609375000
0.0010443972224479999844071187453664606437087059020996093750000000000000

I'm not sure how Matlab prints things.

In answer to your specific questions:

What is the real value of this variable, that I later use in my Simulink simulation? Is it really truncated/rounded to 0.0010444?

As a float, it is really "truncated" to a number which, converted back to decimal, is exactly 0.001044397242367267608642578125. But as we've seen, most of those digits are essentially meaningless, and the result can more properly thought of as being about 0.0010443972.

In the C code the value is read as float gf = 0.001044397222448f; and it prints out as 0.001044397242367267608642578125000

So C got the same answer I did -- but, again, most of those digits are not meaningful.

So the C code keeps good precision. But, does Matlab?

I'd be willing to bet that Matlab keeps the same internal precision for ordinary floats and doubles.

CodePudding user response：

MATLAB uses IEEE-754 binary64 for its double-precision type and binary32 for single-precision. When 0.001044397222448 is rounded to the nearest value representable in binary64, the result is 4816432068447840•2⁻⁶² = 0.001044397222447999984407118745366460643708705902099609375.

When that is rounded to the nearest value representable in binary32, the result is 8971304•2⁻³³ = 0.001044397242367267608642578125.

Various software (C, Matlab, others) displays floating-point numbers in diverse ways, with more or fewer digits. The above values are the exact numbers represented by the floating-point data, per the IEEE 754 specification, and they are the values the data has when used in arithmetic operations.

CodePudding user response：

All single precisions should be the same

So here is the thing. According to documentation, both matlab and C comply with the IEEE 754 standard. Which means that there should not be any difference between what is actually stored in memory.

You could compute the binary representation by hand but according to this(thanks @Danijel) handy website, the representation of 0.001044397222448 should be 0x3a88e428.

The question is how precise is your representation? It is a bit tricky with floating point but the short answer is your number is accurate up to the 9th decimal and has decimal represented up to the 33rd decimal. If you want the long answer see the tow paragraphs at the end of this post.

A display issue

The fact that you are not seeing the same thing when you print does not mean that you don't have the same bits in memory (and you should have the exact same bytes in memory in C and MATLAB). The only reason you see a difference on your display is because the print functions truncate your number. If you print the 33 decimals in each language you should not have any difference.

To do so in matlab use: fprintf('%.33f', value_float);
To do so in c use printf('%.33f\n', gf);

About floating point precision

Now in more details, the question was: how precise is this representation? Well the tricky thing with floating point is that the precision of the representation depends on what number you are representing. The representation is over 32 bits and is divide with 1 bit for the sign, 8 for the exponent and 23 for the fraction.

The number can be computed as sign * 2^(exponent-127) * 1.fraction. This basically means that the maximal error/precision (depending on how you want to call it) is basically 2^(exponent-127-23), the 23 is here to represent the 23 bytes of the fraction. (There are a few edge cases, I won't elaborate on it). In our case the exponent is 117, which means your precision is 2^(117-127-23) = 1.16415321826934814453125e-10. That means that your single precision float should represent your number accurately up to the 9th decimal, after that it is up to luck.

Further details

I know this is a rather short explanation. For more details, this post explains the floating point imprecision more precisely and this website gives you some useful info and allows you to play visually with the representation.