When a file is read through fread
, the columns may be read as integer64 (correctly so), but when these are multiplied with numeric
, they are not upcasted to numeric
(as in C or integers
in R
). While this is a documented behavior in bit64
package. But it is not intuitive, when numbers are multiplied etc. integer64
behaves differently compared to integer
.
Also, integer64
when divided against integer
gives a numeric
variable. So the behavior is very bizarre !
Should we then always fread
using colClasses = numeric
for columns to be used in arithmeric expressions with numeric
etc ?
file contents
x,y
111,0.3
2147483648,0.3
> d <- fread(file)
> print(d$x*d$y)
x y
1: 111 0.3
2: 2147483648 0.3
> as.integer64(111) * 8e-2
integer64
[1] 9
> as.integer64(111) * 8 / 1e2
8.88
Similarly, quantiles
and other R functions will not behave correctly with integer64
. This issue creeps into all classes that use integer64
like nanotime
CodePudding user response:
This is the documented behaviour of bit64
package, see Arithmetic precision and coercion in ?bit64
:
The fact that we introduce 64 bit long long integers – without introducing 128-bit long doubles – creates some subtle challenges
The multiplication operator * coerces its first argument to integer64 but allows its second argument to be also double: the second argument is internaly coerced to 'long double' and the result of the multiplication is returned as integer64
as.integer64(111) * 8e-2
integer64
[1] 9
The division / and power ^ operators also coerce their first argument to integer64 and coerce internally their second argument to 'long double', they return as double
as.integer64(111) * 8 / 1e2
8.88
To avoid this, you could set integer64
parameter of fread
to "double"
. To be used with care as there is an open issue.