Differences between equivalent formulas-CodePudding

I was comparing two different formulas to calculate percentage on R and, despite they are equivalent and should produce exactly the same value (if my math is not gravely mistaken), they do not seem to produce the exact result.

Let me present you an example:

set.seed(123)

a<-rnorm(100)

perc_1<-(a/sum(a))*100
perc_2<-(a*100)/sum(a)

Now, you have differences according with the function you use to check if they are equal: all.equal(perc_1,perc_2) is TRUE but, all(perc_1==perc_2) is FALSE. However, I can understand that they produce different results, because the latter tests exact equality while the former tests near exact equality.

If I perform a summary of the difference, I get this:

summary(perc_1-perc_2)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-3.553e-15  0.000e 00  0.000e 00  1.818e-17  0.000e 00  3.553e-15

So, my question is: does anyone has an explanation for this discrepancy?

Thanks in advance.

CodePudding user response：

It is because of how the number precision is represented in the computers (sustainably - not everyone requires more than 15 decimal places of precision). Also, the numbers are processed in binary!!!

Precision means the number of digits it can represent. For e.g., the float is a 32-bit data-type which offers precision up to 7 places after the decimal. To get better precision one must use double data-type offering up to 15 digits of precision and uses 64bits for representation.

Now, you can see in your summary that inequality exists near the 15th decimal, which is expected as representations are not precise after that. To get a precise calculation you'll have to have a data-type that can span the whole representation in binary. Try creating your own 128bit representation and you'll see that precision improves. :)