Home > Software engineering >  batch store non-missing value in a new column in data.table by a reference data.table
batch store non-missing value in a new column in data.table by a reference data.table

Time:01-16

I have a dt.2 with columns including a, b, c, x, d and e. Some columns actually have the same meaning and could be used for imputation. For example, a, b and c equal to x while d and e equal to y. These replationships are recorded in another table dt. I want to impute NA value to x and y by associated columns respectively.

In order to do this in batch, I created a 2 dimention list(object result) based on Darren Tsai's answer here Turn a datatable into a two-dimensional list in R

Then, I intended to use a for loop but kept getting error. Not sure if my direction was correct. Or if I overcomplicated this problem. Thanks!

library(data.table)
# data.table of reference
dt <- data.table(col1 = c("a", "b", "c", "d", "e"),
                 col2 = c("x", "x", "x", "y", "y"))

# data.table to work with
dt.2 <- data.table(a = c(1,  NA, 4),
                   b = c(NA, NA, 2),
                   c = c(NA, NA, NA),
                   x = c(NA, 3,  NA),
                   d = c(5,  6,  NA),
                   e = c(NA, NA, 7))

# my current code
lst1 <- split(dt$col1, dt$col2)
lst2 <- as.list(names(lst1))
result <- list(unname(lst1), lst2)

for(i in 1:2){
  new.col <- result[[2]][i][[1]]
  old.cols <- unique(result[[1]][i][[1]])

  dt.2[, new.col := fcoalesce(old.cols)]
}

CodePudding user response:

Assuming that what you need is for x to be updated and for y to be created based on the existing values of the other column names.

First, c is logical when it needs to be as.numeric. A literal fix is merely

dt.2[, c := as.numeric(c)]

Try this:

spl <- split(dt$col1, dt$col2)
spl <- Map(c, names(spl), spl)
spl
# $x
# [1] "x" "a" "b" "c"
# $y
# [1] "y" "d" "e"

dt.2[, names(spl) := lapply(spl, function(cols) do.call(fcoalesce, .SD[, intersect(cols, names(.SD)), with = FALSE]))]
dt.2
#        a     b     c     x     d     e     y
#    <num> <num> <num> <num> <num> <num> <num>
# 1:     1    NA    NA     1     5    NA     5
# 2:    NA    NA    NA     3     6    NA     6
# 3:     4     2    NA     4    NA     7     7
  • Related