I have a function that returns NA
under certain conditions and an integer otherwise (an integer vector in fact, but it doesn't matter now).
When I apply this function to groups of elements in a data.table
and the first group returns NA, then the whole column is erroneously set to logical
thus screwing up the following elements. How can I prevent this behaviour?
Example:
library(data.table)
myfun <- function(x) {
if(x == 0) {
return(NA)
} else {
return(x*2)
}
}
DT <- data.table(x= c(0, 1, 2, 3), y= LETTERS[1:4])
DT
x y
1: 0 A
2: 1 B
3: 2 C
4: 3 D
The following should assign to column x2
the values c(NA, 2, 4, 6)
. Instead, I get c(NA, TRUE, TRUE, TRUE)
with warnings:
DT[, x2 := myfun(x), by= y]
Warning messages:
1: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
Group 2 column 'x2': 2.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'
2: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
Group 3 column 'x2': 4.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'
3: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
Group 4 column 'x2': 6.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'
DT
x y x2
1: 0 A NA
2: 1 B TRUE
3: 2 C TRUE
4: 3 D TRUE
Changing the order of the rows gives the expected result:
DT <- data.table(x= c(1, 2, 3, 0), y= LETTERS[1:4])
DT[, x2 := myfun(x), by= y]
DT
x y x2
1: 1 A 2
2: 2 B 4
3: 3 C 6
4: 0 D NA
I can preset the value of column x2
:
DT <- data.table(x= c(0, 1, 2, 3), y= LETTERS[1:4])
DT[, x2 := integer()]
DT[, x2 := myfun(x), by= y]
DT
x y x2
1: 0 A NA
2: 1 B 2
3: 2 C 4
4: 3 D 6
but I wonder if there are better options that don't require me to set the column type beforehand.
This is with data.table v1.14.0, R 3.6.3
CodePudding user response:
Do not let your function return NA
, but NA_integer_
, or NA_real_
..
problem solved ;-)
myfun <- function(x) {
if(x == 0) {
return(NA_integer_) #<-- !!
} else {
return(x*2)
}
}