As a simplified example of my problem, say I have four data.tables
dt1
, ..., dt4
, all of which have the same structure:
head(dt1)
date x y
1: 2000-10-01 0.4527087 -0.11590788
2: 2001-10-01 0.7200252 -0.55722270
3: 2002-10-01 -1.3804472 -1.47030087
4: 2003-10-01 -0.1380225 2.34157766
5: 2004-10-01 -0.9288675 -1.32993998
6: 2005-10-01 -0.9592633 0.76316150
That is, they all have three columns called date
, x
and y
. My desired output is a merged data.table
(on date) with five columns: date
, and then the x
column from each individual table renamed to reflect its original data.table
:
head(desired_output)
date x_dt1 x_dt2 x_dt3 x_dt4
1: 2000-10-01 0.4527087 -0.11590788 1.1581946 -1.5159040
2: 2001-10-01 0.7200252 -0.55722270 -1.6247254 -0.3325556
3: 2002-10-01 -1.3804472 -1.47030087 -0.9766309 -0.2368857
4: 2003-10-01 -0.1380225 2.34157766 1.1831091 -0.4399184
5: 2004-10-01 -0.9288675 -1.32993998 0.8716144 -0.4086229
6: 2005-10-01 -0.9592633 0.76316150 -0.8860816 -0.4299365
I assume this can be done using the suffixes
argument of merge.data.table
somehow. I have tried to modify mergeDTs
from this answer without success as yet. A solution that successfully modifies mergeDTs
(or just using a function that could be applied to a list of of several data.tables
) would be excellent.
I am aware of this very slick dplyr/purrr answer but would prefer a data.table
solution.
Example data
library(data.table)
dt1 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt2 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt3 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt4 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
CodePudding user response:
Here is a possible approach accumulating the merged result in a simple for
loop:
library(data.table)
dt <- dt1[, .(date, x)]
for(i in 2:4) {
dt <- merge(dt, get(paste0("dt", i))[, .(date, x)], by = "date", suffixes = c("", paste0("_dt", i)))
}
setnames(dt, old = "x", new = "x_dt1")
head(dt)
#> date x_dt1 x_dt2 x_dt3 x_dt4
#> 1: 2000-10-01 -1.5035218 2.0463775 -0.120544283 -0.5662290
#> 2: 2001-10-01 0.5977386 -0.1968421 -0.840102174 1.2412272
#> 3: 2002-10-01 -0.9100557 -0.1687148 -1.738526471 1.3685767
#> 4: 2003-10-01 0.7027232 0.9009135 -0.247273205 1.3135718
#> 5: 2004-10-01 0.5269265 0.6176381 -0.007662592 -0.2928206
#> 6: 2005-10-01 -0.8350406 -0.7343245 -0.643701996 2.3068948
Or alternatively, accumulating the merged result with Reduce()
:
Reduce(
f = function(dt, dti) merge(dt, get(dti)[, .(date, x)], by = "date", suffixes = c("", paste0("_", dti))),
x = paste0("dt", 1:4),
init = dt1[, .(date, x)]
)[, x := NULL][]
Note: to get rid of the get()
calls we can collect all data.tables in a list before merging or write a small function wrapper, e.g.
merge_dts <- function(...) {
dts <- list(...)
dt <- dts[[1]][, .(date, x)]
for(i in seq_along(dts)[-1]) {
dt <- merge(dt, dts[[i]][, .(date, x)], by = "date", suffixes = c("", paste0("_dt", i)))
}
setnames(dt, old = "x", new = "x_dt1")
return(dt)
}
merge_dts(dt1, dt2, dt3, dt4)
CodePudding user response:
Another solution:
nms = paste0("dt", 1:4)
lapply(mget(nms), `[`, j=c("date", "x")) |>
Map(f=setnames, "x", paste0("x_", nms)) |>
Reduce(f=\(x,y) merge(x, y, by="date"))
date x_dt1 x_dt2 x_dt3 x_dt4
<Date> <num> <num> <num> <num>
1: 2000-10-01 1.5567877 -0.3579598 1.9765388 -0.76325891
2: 2001-10-01 -0.2656003 0.6599606 -0.8085454 -0.43846379
3: 2002-10-01 -0.3620652 -1.1952980 -0.7108491 -0.57261896
4: 2003-10-01 2.2358663 0.3903301 -1.0370673 1.27767750
5: 2004-10-01 0.7298361 -0.4111746 -0.5046095 0.35132983
6: 2005-10-01 -0.2749529 -0.3080462 -1.5936609 0.49925692
7: 2006-10-01 -0.4624282 -0.6598809 -2.0893062 -0.10963342
8: 2007-10-01 -1.3491373 1.1325984 -0.4708666 0.15541533
9: 2008-10-01 -1.9459169 0.3331132 -0.5091114 -0.56520321
10: 2009-10-01 -0.8994460 -0.5218689 -0.4177664 0.69639938
11: 2010-10-01 0.8663232 -1.6789679 -0.7827229 -0.05562114