I am interested in joining two data.tables in a function. However, when using the new env
for programming on the data.table, I am unable to join the data.tables in a function because the argument I attempt to join on does not exist, i.e. I get a "argument specifying columns received non-existing columns" error. How can I programmatically feed the matching column for joining two data.tables into a function? I provide a minimal working example of a surprising failure below.
dt.mwe.1 <- data.table(a = c(1,2,3,4,0,10))
mwe_function = function(dt, merge_var){
dt.internal =
data.table(z = min(dt):max(dt)) %>%
.[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
dt.internal2 =
data.table(z = min(dt):max(dt)) %>%
.[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
dt.internal
dt.internal[dt.internal2, on = .(mv),
env = list(mv = merge_var)] %>% `[`
}
# fails
mwe_function(dt = dt.mwe.1, merge_var = "a")
# also fails
mwe_function(dt = dt.mwe.1, merge_var = a)
CodePudding user response:
Maybe I am missing your point, but what about:
mwe_function = function(dt, merge_var){
dt.internal =
data.table(z = min(dt):max(dt)) %>%
.[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
dt.internal2 =
data.table(z = min(dt):max(dt)) %>%
.[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
dt.internal
dt.internal[dt.internal2, on = merge_var] %>% `[`
}
mwe_function(dt = dt.mwe.1, merge_var = "a")
# a
# <int>
# 1: 0
# 2: 1
# 3: 2
# 4: 3
# 5: 4
# 6: 5
# 7: 6
# 8: 7
# 9: 8
# 10: 9
# 11: 10
From the help of ?data.table
:
env: List or an environment, passed to ‘substitute2’ for substitution of parameters in ‘i’, ‘j’ and ‘by’ (or ‘keyby’). Use ‘verbose’ to preview constructed expressions.
So I guess the env
approach does not work on the on
argument, which, however, accepts anyways strings as input.