I am using a function called "boss" which calls another function called "worker". The worker function takes a dataset (e.g. mtcars) and uses an algorithm provided as a string (e.g. "Algorithm_1") to calculate an output. The worker function returns a dataframe which specifies the data and algorithm used to calculate the respective output.
I want to pass the data object to the boss function which calls the worker funtion three times and makes the same calculations with the different algorithms and combines all results in one dataframe in order to compare them.
My problem is that boss() returns "data" instead of "mtcars" in the column "Data" of the returned object. The argument "data" in the boss function somewhat masks the actual name of the dataset (e.g. mtcars).
This is my code:
# Load data
data("mtcars")
head(mtcars)
# Define worker function
worker <- function(data, algorithm){
# Model: For simplicity, I just generate a random number
output = runif(1)
# Get passed data name
data_name <- deparse(substitute(data))
# Create results dataframe and store used data and algorithm to compute test statistic
results <- data.frame(Data = data_name,
Algorithm = algorithm,
Output = output)
# Return used data and algorithm and the respective output
return(results)
}
# Define boss function which calls worker function
boss <- function(data){
# Run and save individual objects (same data but different algorithm)
model1 <- worker(data, "Algorithm_1")
model2 <- worker(data, "Algorithm_2")
model3 <- worker(data, "Algorithm_3")
# combine and output all models in one dataframe
return(rbind(model1, model2, model3))
}
worker_output <- worker (mtcars, "Algorithm_1")
worker_output
Data Algorithm Output
1 mtcars Algorithm_1 0.9275785
boss_output <- boss(mtcars)
boss_output
Data Algorithm Output
1 data Algorithm_1 0.9857309
2 data Algorithm_2 0.5066107
3 data Algorithm_3 0.2939690
As you can see, the worker function displays the actual name of the data (e.g. mtcars) used. However, the boss function displays just "data" and not "mtcars".
Any help how to change that would be very appreciated.
Thank you.
CodePudding user response:
Your issue is that worker
is seeing the name of the variable as boss
knows it.
Two options:
My initial answer is to do
deparse(substitute(.))
inboss
and pass the name toworker
. If you cannot change the args inworker
, then skip to option 2.worker <- function(data, algorithm, data_name) { ## CHANGE # Model: For simplicity, I just generate a random number output = runif(1) # Get passed data name ## CHANGE if (missing(data_name)) data_name <- deparse(substitute(data)) # Create results dataframe and store used data and algorithm to compute test statistic results <- data.frame(Data = data_name, Algorithm = algorithm, Output = output) # Return used data and algorithm and the respective output return(results) } boss <- function(data){ nm <- deparse(substitute(data)) ## ADD # Run and save individual objects (same data but different algorithm) model1 <- worker(data, "Algorithm_1", nm) ## CHANGE model2 <- worker(data, "Algorithm_2", nm) ## model3 <- worker(data, "Algorithm_3", nm) ## # combine and output all models in one dataframe return(rbind(model1, model2, model3)) }
Demonstration:
(worker_output <- worker (mtcars, "Algorithm_1")) # Data Algorithm Output # 1 mtcars Algorithm_1 0.5909222 (boss_output <- boss(mtcars)) # Data Algorithm Output # 1 mtcars Algorithm_1 0.9313239 # 2 mtcars Algorithm_2 0.9598333 # 3 mtcars Algorithm_3 0.7709709
Capture the
call
in boss and pass it along (augmented) to worker. This takes a little more "care" in that we need to append arguments to the original call that are required forworker
but not available inboss
.# worker <- function(data, algorithm) { ... } # your original function boss <- function(data){ cl <- match.call() cl[[1]] <- substitute(worker) pfrm <- parent.frame() do.call(rbind, lapply(c("Algorithm_1", "Algorithm_2", "Algorithm_3"), function(algo) { cl[[3]] <- algo eval(cl, envir = pfrm) })) }
Demonstration:
worker(mtcars, "Algorithm_1") # Data Algorithm Output # 1 mtcars Algorithm_1 0.06736155 boss(mtcars) # Data Algorithm Output # 1 mtcars Algorithm_1 0.2186567 # 2 mtcars Algorithm_2 0.6684886 # 3 mtcars Algorithm_3 0.4457933
(This method also introduces two optional things: using
lapply
across a vector of possible algorithm names; and combining them withdo.call(rbind, ...)
. Neither of these is required, feel free to do three calls tocl[[3]] <- "Algorithm_#"; eval(cl, envir = frm);
.)
CodePudding user response:
This is the kind of situation where it makes a lot of sense to use a nested function. For example, rewriting your original one:
# Load data
data("mtcars")
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Define boss function which calls worker function
boss <- function(data){
data_name <- deparse(substitute(data))
# Define worker function
worker <- function(algorithm){
# Model: For simplicity, I just generate a random number
output = runif(1)
# Create results dataframe and store used data and algorithm to compute test statistic
results <- data.frame(Data = data_name,
Algorithm = algorithm,
Output = output)
# Return used data and algorithm and the respective output
return(results)
}
# Run and save individual objects (same data but different algorithm)
model1 <- worker("Algorithm_1")
model2 <- worker("Algorithm_2")
model3 <- worker("Algorithm_3")
# combine and output all models in one dataframe
return(rbind(model1, model2, model3))
}
boss_output <- boss(mtcars)
boss_output
#> Data Algorithm Output
#> 1 mtcars Algorithm_1 0.01205201
#> 2 mtcars Algorithm_2 0.32131365
#> 3 mtcars Algorithm_3 0.17859081
Created on 2022-09-26 with reprex v2.0.2
An advantage of using nested functions like this is that it leaves your namespace cleaner: only boss
can see worker
.
It also makes the interface to worker
simpler.
There may be disadvantages when debugging, because your test code
worker_output <- worker (mtcars, "Algorithm_1")
worker_output
would have to be run within boss
, and you'd need to explicitly print(worker_output)
to see it. And for some reason I don't understand, RStudio won't set breakpoints in worker
, even though the base setBreakpoint
function has no trouble doing that.