Home > Enterprise >  Keep object name when calling function within a function
Keep object name when calling function within a function

Time:09-27

I am using a function called "boss" which calls another function called "worker". The worker function takes a dataset (e.g. mtcars) and uses an algorithm provided as a string (e.g. "Algorithm_1") to calculate an output. The worker function returns a dataframe which specifies the data and algorithm used to calculate the respective output.

I want to pass the data object to the boss function which calls the worker funtion three times and makes the same calculations with the different algorithms and combines all results in one dataframe in order to compare them.

My problem is that boss() returns "data" instead of "mtcars" in the column "Data" of the returned object. The argument "data" in the boss function somewhat masks the actual name of the dataset (e.g. mtcars).

This is my code:

# Load data
data("mtcars")
head(mtcars)

# Define worker function
worker <- function(data, algorithm){
  
  # Model: For simplicity, I just generate a random number
  output = runif(1)
  
  # Get passed data name
  data_name <- deparse(substitute(data))
  
  # Create results dataframe and store used data and algorithm to compute test statistic
  results <- data.frame(Data = data_name,
                        Algorithm = algorithm,
                        Output = output) 
  
  
  # Return used data and algorithm and the respective output
  return(results)

}

# Define boss function which calls worker function
boss <- function(data){
  
  # Run and save individual objects (same data but different algorithm)
  model1 <- worker(data, "Algorithm_1")
  model2 <- worker(data, "Algorithm_2")
  model3 <- worker(data, "Algorithm_3")
  
  # combine and output all models in one dataframe
  return(rbind(model1, model2, model3))
  
  
}

worker_output <- worker (mtcars, "Algorithm_1")
worker_output

    Data   Algorithm    Output
1 mtcars Algorithm_1 0.9275785

boss_output <- boss(mtcars)
boss_output

  Data   Algorithm    Output
1 data Algorithm_1 0.9857309
2 data Algorithm_2 0.5066107
3 data Algorithm_3 0.2939690

As you can see, the worker function displays the actual name of the data (e.g. mtcars) used. However, the boss function displays just "data" and not "mtcars".

Any help how to change that would be very appreciated.

Thank you.

CodePudding user response:

Your issue is that worker is seeing the name of the variable as boss knows it.

Two options:

  1. My initial answer is to do deparse(substitute(.)) in boss and pass the name to worker. If you cannot change the args in worker, then skip to option 2.

    worker <- function(data, algorithm, data_name) {   ## CHANGE
      # Model: For simplicity, I just generate a random number
      output = runif(1)
      # Get passed data name                           ## CHANGE
      if (missing(data_name)) data_name <- deparse(substitute(data))
      # Create results dataframe and store used data and algorithm to compute test statistic
      results <- data.frame(Data = data_name,
                            Algorithm = algorithm,
                            Output = output)
      # Return used data and algorithm and the respective output
      return(results)
    }
    boss <- function(data){
      nm <- deparse(substitute(data))                  ## ADD
      # Run and save individual objects (same data but different algorithm)
      model1 <- worker(data, "Algorithm_1", nm)        ## CHANGE
      model2 <- worker(data, "Algorithm_2", nm)        ##
      model3 <- worker(data, "Algorithm_3", nm)        ##
      # combine and output all models in one dataframe
      return(rbind(model1, model2, model3))
    }
    

    Demonstration:

    (worker_output <- worker (mtcars, "Algorithm_1"))
    #     Data   Algorithm    Output
    # 1 mtcars Algorithm_1 0.5909222
    (boss_output <- boss(mtcars))
    #     Data   Algorithm    Output
    # 1 mtcars Algorithm_1 0.9313239
    # 2 mtcars Algorithm_2 0.9598333
    # 3 mtcars Algorithm_3 0.7709709
    
  2. Capture the call in boss and pass it along (augmented) to worker. This takes a little more "care" in that we need to append arguments to the original call that are required for worker but not available in boss.

    # worker <- function(data, algorithm) { ... } # your original function
    boss <- function(data){
      cl <- match.call()
      cl[[1]] <- substitute(worker)
      pfrm <- parent.frame()
      do.call(rbind,
              lapply(c("Algorithm_1", "Algorithm_2", "Algorithm_3"),
                     function(algo) {
                       cl[[3]] <- algo
                       eval(cl, envir = pfrm)
                     }))
    }
    

    Demonstration:

    worker(mtcars, "Algorithm_1")
    #     Data   Algorithm     Output
    # 1 mtcars Algorithm_1 0.06736155
    boss(mtcars)
    #     Data   Algorithm    Output
    # 1 mtcars Algorithm_1 0.2186567
    # 2 mtcars Algorithm_2 0.6684886
    # 3 mtcars Algorithm_3 0.4457933
    

    (This method also introduces two optional things: using lapply across a vector of possible algorithm names; and combining them with do.call(rbind, ...). Neither of these is required, feel free to do three calls to cl[[3]] <- "Algorithm_#"; eval(cl, envir = frm);.)

CodePudding user response:

This is the kind of situation where it makes a lot of sense to use a nested function. For example, rewriting your original one:

# Load data
data("mtcars")
head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1



# Define boss function which calls worker function
boss <- function(data){
  
  data_name <- deparse(substitute(data))
  
  # Define worker function
  worker <- function(algorithm){
    
    # Model: For simplicity, I just generate a random number
    output = runif(1)
    
    # Create results dataframe and store used data and algorithm to compute test statistic
    results <- data.frame(Data = data_name,
                          Algorithm = algorithm,
                          Output = output) 
    
    
    # Return used data and algorithm and the respective output
    return(results)
    
  }
  
  # Run and save individual objects (same data but different algorithm)
  model1 <- worker("Algorithm_1")
  model2 <- worker("Algorithm_2")
  model3 <- worker("Algorithm_3")
  
  # combine and output all models in one dataframe
  return(rbind(model1, model2, model3))
  
  
}

boss_output <- boss(mtcars)
boss_output
#>     Data   Algorithm     Output
#> 1 mtcars Algorithm_1 0.01205201
#> 2 mtcars Algorithm_2 0.32131365
#> 3 mtcars Algorithm_3 0.17859081

Created on 2022-09-26 with reprex v2.0.2

An advantage of using nested functions like this is that it leaves your namespace cleaner: only boss can see worker. It also makes the interface to worker simpler.

There may be disadvantages when debugging, because your test code

worker_output <- worker (mtcars, "Algorithm_1")
worker_output

would have to be run within boss, and you'd need to explicitly print(worker_output) to see it. And for some reason I don't understand, RStudio won't set breakpoints in worker, even though the base setBreakpoint function has no trouble doing that.

  • Related