Obtaining different results from sum() and ' '-CodePudding

Below is my experiment:

> xx = 293.62882204364098
> yy = 0.086783439604999998
> print(xx   yy, 20)
[1] 293.71560548324595175
> print(sum(c(xx,yy)), 20)
[1] 293.71560548324600859

It is strange to me that sum() and giving different results when both are applied to the same numbers.

Is this result expected?

How can I get the same result?

Which one is most efficient?

CodePudding user response：

There is an r-devel thread here that includes some detailed description of the implementation. In particular, from Tomas Kalibera:

R uses long double type for the accumulator (on platforms where it is available). This is also mentioned in ?sum: "Where possible extended-precision accumulators are used, typically well supported with C99 and newer, but possibly platform-dependent."

This would imply that sum() is more accurate, although this comes with a giant flashing warning sign that if this level of accuracy is important to you, you should be very worried about the implementation of your calculations [in terms both of algorithms and underlying numerical implementations].

I answered a question here where I eventually figured out (after some false starts) that the difference between and sum() is due to the use of extended precision for sum().

This code shows that the sums of individual elements (as in sum(xx,yy) are added together with (in C), whereas this code is used to sum the individual components; line 154 (LDOUBLE s=0.0) shows that the accumulator is stored in extended precision (if available).

I believe that @JonSpring's timing results are probably explained (but would be happy to be corrected) by (1) sum(xx,yy) will have more processing, type-checking etc. than ; (2) sum(c(xx,yy)) will be slightly slower than sum(xx,yy) because it works in extended precision.

CodePudding user response：

Looks like addition is 3x as fast as summing, but unless you're doing high-frequency trading I can't see a situation where this would be your timing bottleneck.

xx = 293.62882204364098
yy = 0.086783439604999998

microbenchmark::microbenchmark(xx   yy, sum(xx,yy), sum(c(xx, yy)))
Unit: nanoseconds
           expr min    lq   mean median    uq  max neval
        xx   yy  88 102.5 111.90  107.0 110.0  352   100
    sum(xx, yy) 201 211.0 256.57  218.5 232.5 2886   100
 sum(c(xx, yy)) 283 297.5 330.42  304.0 311.5 1944   100

CodePudding user response：

Additional to the answer provided here I want to present my thinking: Here is an explanation how I made it clear to myself! Please clarify if I am wrong!

As far as I know this is because DOUBLE works with 53 bits (about 16 digits)

# both xx and yy are double type
typeof(print(xx   yy, 20))
typeof(print(sum(c(xx,yy)), 20))

# all values with after comma places > 16 will give different results
print(xx   yy, 17)
print(sum(c(xx,yy)), 17)

#[1] 293.71560548324595
#[1] 293.71560548324601

# all values with after comma places <= 16 will give identical results
print(xx   yy, 16)
print(sum(c(xx,yy)), 16)

#[1] 293.715605483246
#[1] 293.715605483246

# As far as I know this is because DOUBLE works with 53 bits (about 16 digits)