Home > Software engineering >  Function inside a function, access previous environment
Function inside a function, access previous environment

Time:03-24

I am trying to access a data frame created within a function and use its environment to access some data. I know the ideal is to call it as an input in the function, but I am trying to avoid it, so I don't have to list hundreds of inputs. Any help or resources are appreciated.

An example is below:

library(dplyr)

data_v1 <- tribble(~var1, ~var2,
                    1, 2)

postprocess_data <- function(){
  
  data_v3 <- data_v2 %>% 
    mutate(var2 = var2*3)
  
  data_v3
}


process_data <- function(){
  
  data_v2 <- data_v1 %>% 
    mutate(var1 = var1*3)
  
  data_v3_inside <- postprocess_data()
  
  data_v3_inside
    
}

process_data()

CodePudding user response:

We can get the object from the parent.frame

postprocess_data <- function(){
  
  data_v2 <- get("data_v2", envir = parent.frame())
  data_v3 <- data_v2 %>% 
    mutate(var2 = var2*3)
  
  data_v3
}


process_data <- function(){
  
  data_v2 <- data_v1 %>% 
    mutate(var1 = var1*3)
  
  data_v3_inside <- postprocess_data()
  
  data_v3_inside
    
}

-testing

process_data()
# A tibble: 1 × 2
   var1  var2
  <dbl> <dbl>
1     3     6

CodePudding user response:

Your title describes the right way to do this.

Since process_data only needs to access data_v1, it's fine as it is (though it would be better style if data_v1 was an argument).

But postprocess_data needs access to data_v2, a local variable in process_data. So the ideal design is to define postprocess_data inside process_data. Then it will be able to see all of the variables that are local to process_data, as well as all global variables. For example,

library(dplyr)

data_v1 <- tribble(~var1, ~var2,
                    1, 2)


process_data <- function(){

  postprocess_data <- function(){
  
    data_v3 <- data_v2 %>% 
      mutate(var2 = var2*3)
  
    data_v3
  }


  data_v2 <- data_v1 %>% 
    mutate(var1 = var1*3)
  
  data_v3_inside <- postprocess_data()
  
  data_v3_inside
    
}

process_data()
#> # A tibble: 1 × 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     3     6

Created on 2022-03-23 by the reprex package (v2.0.1)

Edited to add: There's one slightly risky thing in the way I wrote the code here.

If you ever modified process_data() so that it called postprocess_data() before creating data_v2, then the nested function would not find it in the enclosing environment, and would keep looking for it in the global environment. If it happened to find a copy there you might end up with a subtle bug that caused you trouble.

So a good idea is to create any variable used by the nested function very early, e.g. setting data_v2 <- NULL before the definition of postprocess_data. The NULL value should trigger an error if you haven't replaced it at the time you call the nested function.

CodePudding user response:

1) Assuming that you have control over and can modify process_data insert this as the first line of the body. That will make a copy of postprocess_data except its environment will be the frame within the running process_data so that any free variables in postprocess_data will be looked up there.

environment(postprocess_data) <- environment()

2) If you don't have control over process_data and cannot change it but do have control over postprocess_data replace it with this.

  postprocess_data <- function(envir = parent.frame()) with(envir, {       
    data_v3 <- data_v2 %>% 
      mutate(var2 = var2*3)
  
    data_v3
  })

2a) or with this which makes it act sort of like a macro.

  postprocess_data <- function(envir = parent.frame()) eval(substitute({
  
    data_v3 <- data_v2 %>% 
      mutate(var2 = var2*3)
  
    data_v3
  }), envir = envir)
  •  Tags:  
  • r
  • Related