Matching function in R: match.fun vs deparse(substitute()) vs supplying function "directly&quot-CodePudding

fun1, fun2 and fun3 seems to work as expected:

fun1 <- function(fun, x) {
  fun(x)
}

fun1(mean, 1:10)
fun1(as.character, 1:10)
fun1(notafun, 1:10)

fun2 <- function(fun, x) {
  fun <- match.fun(fun)
  fun(x)
}

fun2(mean, 1:10)
fun2(as.character, 1:10)
fun2(notafun, 1:10)

fun3 <- function(fun, x) {
  fun <- deparse(substitute(fun))
  do.call(fun, list(x))
}

fun3(mean, 1:10)
fun3(as.character, 1:10)
fun3(notafun, 1:10)

Is one strategy to be preferred in general? So far, I only notice that match.fun also works if fun is specified as a string.

My use case is a non-exported function in a package for local use (where it is not a limitation if I can't specify fun as a string). Is there any benefits of using match.fun instead of supplying function "directly" (like in fun1).

CodePudding user response：

One key difference is that fun3 will fail if called inside an enclosing function, eg:

g <- function(f, x)
{
    fun3(f, x)
}

g(mean, 1:10)
# Error in f(1:10) : could not find function "f"

In general, try to avoid nonstandard evaluation tricks unless absolutely necessary.

CodePudding user response：

First, documentation! Here are relevant sections from ?match.fun:

When called inside functions that take a function as argument, extract the desired function object while avoiding undesired matching to objects of other types.

If FUN is a function, it is returned. If it is a symbol (for example, enclosed in backquotes) or a character vector of length one, it will be looked up using get in the environment of the parent of the caller.

Thus, match.fun serves two main purposes:

It gives users the option of passing strings and symbols instead of functions.
It defends against pathological user input by throwing an error if the user passes a non-function (including strings and symbols that do not match any function known to the calling environment).

It is almost always better to use match.fun (as in your fun2) than to do nothing (as in your fun1), even if your use case is an unexported function, because the assurance that it provides has virtually no performance cost, unless you care about microseconds:

x1 <- mean
x2 <- "mean"
x3 <- quote(mean)
microbenchmark::microbenchmark(match.fun(x1), match.fun(x2), match.fun(x3), times = 1000L)
# Unit: nanoseconds
#           expr  min   lq     mean median   uq   max neval
#  match.fun(f1)  328  369  416.806    410  410  1558  1000
#  match.fun(f2) 1845 1927 2123.759   1968 2091 33251  1000
#  match.fun(f3) 1804 1886 2106.088   1927 2091 26937  1000

Your fun3 is unique in that it allows users to pass unevaluated expressions, but that approach is problematic for multiple reasons:

It fails to address the core issue with fun1: there is no guarantee that the string returned by deparse names a function that R can find.
It will not work as expected inside of other functions; see @Hong Ooi's comment/answer.

You cannot pass functions accessed with a double or triple colon operator:

fun3(base::mean, 1:10)
# Error in `base::mean`(1:10) : could not find function "base::mean"

Even if it works, it is mostly smoke and mirrors: if the result of deparse(substitute(fun)) is a string naming a function accessible from the calling environment, then there was no need for deparse(substitute(fun)) in the first place, because fun would have evaluated to that function anyway. It does extra work for nothing.

Overall, it is good practice to use match.fun whenever you expect functions as arguments, except perhaps if you want to accept functions but not strings or symbols. Even then, you might be advised to do:

function(FUN, ...) {
  stopifnot(is.function(FUN))
  ## do stuff
}