The deviations of the mean should always sum up to 0. However, when the mean has a lot of digits, maybe infinitely like this one which is 20/7, R fails to calculate it.
x <- c(1,2,2,3,3,4,5)
sum(x - mean(x))
[1] -4.440892e-16
I am quite a newbie and have not found any information about this so far, maybe I was not searching for the right terms. Is it possible to calculate with infinitely long numbers in R? I am asking this out of theoretical interest.
CodePudding user response:
The problem you have described is a general problem with all programming languages. Internally all floats are based on the IEEE754 convention. You can read more about it here.
As far as I know there is no easy way around these small errors, except for using number representations with higher precision.
EDIT: R already used the double precision representation of floating point numbers. To read more about it you can have a look at the R FAQ and this SO question.
CodePudding user response:
If you deal with rational numbers only, such as your example, you can use the gmp package.
You can use the Rmpfr package to deal with numbers with an arbitrary precision (that you have to set).
Another possibility is the lazyNumbers package, freshly released on CRAN:
library(lazyNumbers)
# create a vector of lazy numbers
x <- lazyvec(c(1, 2, 2, 3, 3, 4, 5))
# compute its mean
m <- sum(x) / length(x)
# sum expected to be 0
y <- sum(x - m)
# convert it to double
as.double(y)
## 0