Function inside a function, access previous environment-CodePudding

I am trying to access a data frame created within a function and use its environment to access some data. I know the ideal is to call it as an input in the function, but I am trying to avoid it, so I don't have to list hundreds of inputs. Any help or resources are appreciated.

An example is below:

library(dplyr)

data_v1 <- tribble(~var1, ~var2,
                    1, 2)

postprocess_data <- function(){
  
  data_v3 <- data_v2 %>% 
    mutate(var2 = var2*3)
  
  data_v3
}


process_data <- function(){
  
  data_v2 <- data_v1 %>% 
    mutate(var1 = var1*3)
  
  data_v3_inside <- postprocess_data()
  
  data_v3_inside
    
}

process_data()

CodePudding user response：

We can get the object from the parent.frame

postprocess_data <- function(){
  
  data_v2 <- get("data_v2", envir = parent.frame())
  data_v3 <- data_v2 %>% 
    mutate(var2 = var2*3)
  
  data_v3
}


process_data <- function(){
  
  data_v2 <- data_v1 %>% 
    mutate(var1 = var1*3)
  
  data_v3_inside <- postprocess_data()
  
  data_v3_inside
    
}

-testing

process_data()
# A tibble: 1 × 2
   var1  var2
  <dbl> <dbl>
1     3     6

CodePudding user response：

Your title describes the right way to do this.

Since process_data only needs to access data_v1, it's fine as it is (though it would be better style if data_v1 was an argument).

But postprocess_data needs access to data_v2, a local variable in process_data. So the ideal design is to define postprocess_data inside process_data. Then it will be able to see all of the variables that are local to process_data, as well as all global variables. For example,

library(dplyr)

data_v1 <- tribble(~var1, ~var2,
                    1, 2)


process_data <- function(){

  postprocess_data <- function(){
  
    data_v3 <- data_v2 %>% 
      mutate(var2 = var2*3)
  
    data_v3
  }


  data_v2 <- data_v1 %>% 
    mutate(var1 = var1*3)
  
  data_v3_inside <- postprocess_data()
  
  data_v3_inside
    
}

process_data()
#> # A tibble: 1 × 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     3     6

^{Created on 2022-03-23 by the reprex package (v2.0.1)}

Edited to add: There's one slightly risky thing in the way I wrote the code here.

If you ever modified process_data() so that it called postprocess_data() before creating data_v2, then the nested function would not find it in the enclosing environment, and would keep looking for it in the global environment. If it happened to find a copy there you might end up with a subtle bug that caused you trouble.

So a good idea is to create any variable used by the nested function very early, e.g. setting data_v2 <- NULL before the definition of postprocess_data. The NULL value should trigger an error if you haven't replaced it at the time you call the nested function.

CodePudding user response：

1) Assuming that you have control over and can modify process_data insert this as the first line of the body. That will make a copy of postprocess_data except its environment will be the frame within the running process_data so that any free variables in postprocess_data will be looked up there.

environment(postprocess_data) <- environment()

2) If you don't have control over process_data and cannot change it but do have control over postprocess_data replace it with this.

  postprocess_data <- function(envir = parent.frame()) with(envir, {       
    data_v3 <- data_v2 %>% 
      mutate(var2 = var2*3)
  
    data_v3
  })

2a) or with this which makes it act sort of like a macro.

  postprocess_data <- function(envir = parent.frame()) eval(substitute({
  
    data_v3 <- data_v2 %>% 
      mutate(var2 = var2*3)
  
    data_v3
  }), envir = envir)