I am trying to access a data frame created within a function and use its environment to access some data. I know the ideal is to call it as an input in the function, but I am trying to avoid it, so I don't have to list hundreds of inputs. Any help or resources are appreciated.
An example is below:
library(dplyr)
data_v1 <- tribble(~var1, ~var2,
1, 2)
postprocess_data <- function(){
data_v3 <- data_v2 %>%
mutate(var2 = var2*3)
data_v3
}
process_data <- function(){
data_v2 <- data_v1 %>%
mutate(var1 = var1*3)
data_v3_inside <- postprocess_data()
data_v3_inside
}
process_data()
CodePudding user response:
We can get the object from the parent.frame
postprocess_data <- function(){
data_v2 <- get("data_v2", envir = parent.frame())
data_v3 <- data_v2 %>%
mutate(var2 = var2*3)
data_v3
}
process_data <- function(){
data_v2 <- data_v1 %>%
mutate(var1 = var1*3)
data_v3_inside <- postprocess_data()
data_v3_inside
}
-testing
process_data()
# A tibble: 1 × 2
var1 var2
<dbl> <dbl>
1 3 6
CodePudding user response:
Your title describes the right way to do this.
Since process_data
only needs to access data_v1
, it's fine as it is (though it would be better style if data_v1
was an argument).
But postprocess_data
needs access to data_v2
, a local variable in process_data
. So the ideal design is to define postprocess_data
inside process_data
. Then it will be able to see all of the variables that are local to process_data
, as well as all global variables. For example,
library(dplyr)
data_v1 <- tribble(~var1, ~var2,
1, 2)
process_data <- function(){
postprocess_data <- function(){
data_v3 <- data_v2 %>%
mutate(var2 = var2*3)
data_v3
}
data_v2 <- data_v1 %>%
mutate(var1 = var1*3)
data_v3_inside <- postprocess_data()
data_v3_inside
}
process_data()
#> # A tibble: 1 × 2
#> var1 var2
#> <dbl> <dbl>
#> 1 3 6
Created on 2022-03-23 by the reprex package (v2.0.1)
Edited to add: There's one slightly risky thing in the way I wrote the code here.
If you ever modified process_data()
so that it called postprocess_data()
before creating data_v2
, then the nested function would not find it in the enclosing environment, and would keep looking for it in the global environment. If it happened to find a copy there you might end up with a subtle bug that caused you trouble.
So a good idea is to create any variable used by the nested function very early, e.g. setting data_v2 <- NULL
before the definition of postprocess_data
. The NULL
value should trigger an error if you haven't replaced it at the time you call the nested function.
CodePudding user response:
1) Assuming that you have control over and can modify process_data
insert this as the first line of the body. That will make a copy of postprocess_data
except its environment will be the frame within the running process_data
so that any free variables in postprocess_data
will be looked up there.
environment(postprocess_data) <- environment()
2) If you don't have control over process_data
and cannot change it but do have control over postprocess_data
replace it with this.
postprocess_data <- function(envir = parent.frame()) with(envir, {
data_v3 <- data_v2 %>%
mutate(var2 = var2*3)
data_v3
})
2a) or with this which makes it act sort of like a macro.
postprocess_data <- function(envir = parent.frame()) eval(substitute({
data_v3 <- data_v2 %>%
mutate(var2 = var2*3)
data_v3
}), envir = envir)