I have a dt.2
with columns including a
, b
, c
, x
, d
and e
. Some columns actually have the same meaning and could be used for imputation. For example, a
, b
and c
equal to x
while d
and e
equal to y
. These replationships are recorded in another table dt
.
I want to impute NA value to x
and y
by associated columns respectively.
In order to do this in batch, I created a 2 dimention list(object result
) based on Darren Tsai's answer here Turn a datatable into a two-dimensional list in R
Then, I intended to use a for
loop but kept getting error. Not sure if my direction was correct. Or if I overcomplicated this problem. Thanks!
library(data.table)
# data.table of reference
dt <- data.table(col1 = c("a", "b", "c", "d", "e"),
col2 = c("x", "x", "x", "y", "y"))
# data.table to work with
dt.2 <- data.table(a = c(1, NA, 4),
b = c(NA, NA, 2),
c = c(NA, NA, NA),
x = c(NA, 3, NA),
d = c(5, 6, NA),
e = c(NA, NA, 7))
# my current code
lst1 <- split(dt$col1, dt$col2)
lst2 <- as.list(names(lst1))
result <- list(unname(lst1), lst2)
for(i in 1:2){
new.col <- result[[2]][i][[1]]
old.cols <- unique(result[[1]][i][[1]])
dt.2[, new.col := fcoalesce(old.cols)]
}
CodePudding user response:
Assuming that what you need is for x
to be updated and for y
to be created based on the existing values of the other column names.
First, c
is logical
when it needs to be as.numeric
. A literal fix is merely
dt.2[, c := as.numeric(c)]
Try this:
spl <- split(dt$col1, dt$col2)
spl <- Map(c, names(spl), spl)
spl
# $x
# [1] "x" "a" "b" "c"
# $y
# [1] "y" "d" "e"
dt.2[, names(spl) := lapply(spl, function(cols) do.call(fcoalesce, .SD[, intersect(cols, names(.SD)), with = FALSE]))]
dt.2
# a b c x d e y
# <num> <num> <num> <num> <num> <num> <num>
# 1: 1 NA NA 1 5 NA 5
# 2: NA NA NA 3 6 NA 6
# 3: 4 2 NA 4 NA 7 7