I have a function with one default argument depending on another argument. And I have some strange behaviour. In the example the argument is colheads=colnames(data)
which depends on data
.
aaa <- head(iris)
fun <- function(data, colheads=colnames(data), rownames=1:nrow(data), rownames.label="Rowlabel"){
data <- cbind(rownames, data)
colnames(data) <- c(rownames.label, colheads)
}
fun(aaa)
Here I get an error on the last line colnames(data)
. It looks like the colheads
argument is updated because data
itself is updated the line before.
Because if I try to run this code without function, there is no error.
data <- aaa
colheads <- colnames(data)
rownames.text <- 1:nrow(data)
rownames.label <- "Rowlabel"
data <- cbind(rownames, data)
colnames(data) <- c(rownames.label, colheads)
Then I tried to add some print
within the function to check where it happens (the debug
function also spots the last line). With this, I still get the error. Again, looks like colheads
is updated.
aaa <- head(iris)
fun <- function(data, colheads=colnames(data), rownames=1:nrow(data), rownames.label="Rowlabel"){
data <- cbind(rownames, data)
print(colheads)
colnames(data) <- c(rownames.label, colheads)
}
fun(aaa)
But if I also add a print
before data
is updated, the error disappears.
aaa <- head(iris)
fun <- function(data, colheads=colnames(data), rownames=1:nrow(data), rownames.label="Rowlabel"){
print(colheads)
data <- cbind(rownames, data)
print(colheads)
colnames(data) <- c(rownames.label, colheads)
}
fun(aaa)
I found a workaround using a temporary variable colheads.temp
below.
aaa <- head(iris)
fun <- function(data, colheads=colnames(data), rownames=1:nrow(data), rownames.label="Rowlabel"){
colheads.temp <- colheads
data <- cbind(rownames, data)
colnames(data) <- c(rownames.label, colheads.temp)
}
fun(aaa)
But still, as I am unsure about how R functions work, I am puzzled. Do someone knows what is going on and how R functions actually work?
CodePudding user response:
Yes, this is called lazy evaluation. The argument colheads=colnames(data)
is not evaluated until colheads
is used inside the function. And it will use the current value of data
at the time it is evaluated. This is nice, because if colheads
is never called, then it is never evaluated, making code faster (it has other benefits too, but also drawbacks).
The force
function is made to formalize your workaround, force(colheads)
as one of the first lines of your function would force the evaluation and lock in the definition of colheads
with the current value of the data.
If you'd like to learn more, I'd suggest reading the Functions chapter of Advanced R, or at least the section on lazy evalatuion.