Home > Enterprise >  Column type set by first element being evaluated in r/data.table
Column type set by first element being evaluated in r/data.table

Time:10-15

I have a function that returns NA under certain conditions and an integer otherwise (an integer vector in fact, but it doesn't matter now).

When I apply this function to groups of elements in a data.table and the first group returns NA, then the whole column is erroneously set to logical thus screwing up the following elements. How can I prevent this behaviour?

Example:

library(data.table)

myfun <- function(x) {
    if(x == 0) {
        return(NA)
    } else {
        return(x*2)
    }
}

DT <- data.table(x= c(0, 1, 2, 3), y= LETTERS[1:4])
DT
   x y
1: 0 A
2: 1 B
3: 2 C
4: 3 D

The following should assign to column x2 the values c(NA, 2, 4, 6). Instead, I get c(NA, TRUE, TRUE, TRUE) with warnings:

DT[, x2 := myfun(x), by= y]
Warning messages:
1: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
  Group 2 column 'x2': 2.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'
2: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
  Group 3 column 'x2': 4.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'
3: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
  Group 4 column 'x2': 6.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'

DT
   x y   x2
1: 0 A   NA
2: 1 B TRUE
3: 2 C TRUE
4: 3 D TRUE

Changing the order of the rows gives the expected result:

DT <- data.table(x= c(1, 2, 3, 0), y= LETTERS[1:4])
DT[, x2 := myfun(x), by= y]
DT
   x y x2
1: 1 A  2
2: 2 B  4
3: 3 C  6
4: 0 D NA

I can preset the value of column x2:

DT <- data.table(x= c(0, 1, 2, 3), y= LETTERS[1:4])
DT[, x2 := integer()]
DT[, x2 := myfun(x), by= y]
DT
   x y x2
1: 0 A NA
2: 1 B  2
3: 2 C  4
4: 3 D  6

but I wonder if there are better options that don't require me to set the column type beforehand.

This is with data.table v1.14.0, R 3.6.3

CodePudding user response:

Do not let your function return NA, but NA_integer_, or NA_real_.. problem solved ;-)

myfun <- function(x) {
  if(x == 0) {
    return(NA_integer_)  #<-- !!
  } else {
    return(x*2)
  }
}
  • Related