Home > database >  R data.table cross-join by three variables
R data.table cross-join by three variables

Time:08-02

I'm trying cross join a data.table by three variables (group, id, and date). The R code below accomplishes exactly what I want to do, i.e., each id within each group is expanded to include all of the dates_wanted. But is there a way to do the same thing more efficiently using the excellent data.table package?

library(data.table)

data <- data.table(
    group = c(rep("A", 10), rep("B", 10)),
    id    = c(rep("frank", 5), rep("tony", 5), rep("arthur", 5),  rep("edward", 5)),
    date  = seq(as.IDate("2020-01-01"), as.IDate("2020-01-20"), by = "day")
)

data

dates_wanted <- seq(as.IDate("2020-01-01"), as.IDate("2020-01-31"), by = "day")

names_A <- data[group == "A"][["id"]]

names_B <- data[group == "B"][["id"]]

names_A <- CJ(group = "A", id = names_A, date = dates_wanted, unique = TRUE)

names_B <- CJ(group = "B", id = names_B, date = dates_wanted, unique = TRUE)

alldates <- rbind(names_A, names_B)

alldates

data[alldates, on = .(group, id, date)]

CodePudding user response:

We can use do.call with CJ on the id and date transformed grouped by group:

out <- data[, do.call(CJ, c(.(id = id, date = dates_wanted),
      unique = TRUE)), group]

... checking:

> dim(out)
[1] 124   3
> out0 <- data[alldates, on = .(group, id, date)]
> dim(out0)
[1] 124   3
> all.equal(out, out0)
[1] TRUE

CodePudding user response:

You can also do this:

data[, .(date=dates_wanted), .(group,id)]

Output:

     group     id       date
  1:     A  frank 2020-01-01
  2:     A  frank 2020-01-02
  3:     A  frank 2020-01-03
  4:     A  frank 2020-01-04
  5:     A  frank 2020-01-05
 ---                        
120:     B edward 2020-01-27
121:     B edward 2020-01-28
122:     B edward 2020-01-29
123:     B edward 2020-01-30
124:     B edward 2020-01-31
  • Related