I need to fill each row of a matrix with '1' between 'start' and 'end' columns, where the 'start' and 'end' column names (dates in the real data) are specified for each 'id' in two columns of the matrix.
e.g.
d<- data.table(id = paste0(LETTERS[1:4], 1:4), a = rep(c("c", "d"), 2), b = rep(c("e", "f"),2), c = NA, d = NA, e=NA, f=NA)
id a b c d e f
A1 c e NA NA NA NA
B2 d f NA NA NA NA
C3 c e NA NA NA NA
D4 d f NA NA NA NA
The result would be this:
id a b c d e f
A1 c e 1 1 1 NA
B2 d f NA 1 1 1
C3 c e 1 1 1 NA
D4 d f NA 1 1 1
The real data uses dates as the start and end values and I'm working with close to 20M rows of data, so a data.table solution is preferred. Thanks in advance!
CodePudding user response:
Since c:f
are previously all NA
, I suggest they can be discard/overwritten immediately. Try this:
# not necessary if you have a better `seq` variant for your real data
seq.character <- function(from, to, ...) letters[seq(match(tolower(from), letters), match(tolower(to), letters), ...)]
seq('d', 'g')
# [1] "d" "e" "f" "g"
newd <- rbindlist(Map(function(...) {
o <- seq.character(...)
setNames(as.list(rep(1L, length(o))), o)
}, d$a, d$b), fill = TRUE, use.names = TRUE)
newd
# c d e f
# <int> <int> <int> <int>
# 1: 1 1 1 NA
# 2: NA 1 1 1
# 3: 1 1 1 NA
# 4: NA 1 1 1
cbind(d[,1:3], newd)
# id a b c d e f
# <char> <char> <char> <int> <int> <int> <int>
# 1: A1 c e 1 1 1 NA
# 2: B2 d f NA 1 1 1
# 3: C3 c e 1 1 1 NA
# 4: D4 d f NA 1 1 1