Home > other >  fread reads large integers as integer64, which are not upcasted to doubles in case of arithemetic ex
fread reads large integers as integer64, which are not upcasted to doubles in case of arithemetic ex

Time:02-25

When a file is read through fread, the columns may be read as integer64 (correctly so), but when these are multiplied with numeric, they are not upcasted to numeric (as in C or integers in R). While this is a documented behavior in bit64 package. But it is not intuitive, when numbers are multiplied etc. integer64 behaves differently compared to integer.

Also, integer64 when divided against integer gives a numeric variable. So the behavior is very bizarre !

Should we then always fread using colClasses = numeric for columns to be used in arithmeric expressions with numeric etc ?


    file contents
    x,y
    111,0.3
    2147483648,0.3

    > d <- fread(file)     
    > print(d$x*d$y)
            x       y
1:        111       0.3
2: 2147483648       0.3

> as.integer64(111) * 8e-2
integer64
[1] 9
> as.integer64(111) * 8 / 1e2
8.88

Similarly, quantiles and other R functions will not behave correctly with integer64. This issue creeps into all classes that use integer64 like nanotime

CodePudding user response:

This is the documented behaviour of bit64 package, see Arithmetic precision and coercion in ?bit64:

The fact that we introduce 64 bit long long integers – without introducing 128-bit long doubles – creates some subtle challenges

The multiplication operator * coerces its first argument to integer64 but allows its second argument to be also double: the second argument is internaly coerced to 'long double' and the result of the multiplication is returned as integer64

as.integer64(111) * 8e-2
integer64
[1] 9

The division / and power ^ operators also coerce their first argument to integer64 and coerce internally their second argument to 'long double', they return as double

as.integer64(111) * 8 / 1e2
8.88

To avoid this, you could set integer64 parameter of fread to "double". To be used with care as there is an open issue.

  • Related