Error when trying to use Internal function-CodePudding

I was trying to optimize code and wanted to use the .Internal implementation of vapply and got an Error I don't understand. (For now I'll take the warning of ?.Internal seriously, that "Only true R wizards should even consider using this function" and use the user visible vapply but I'd like to understand the error better nontheless.)

test <- rnorm(10000)
test_l <- as.list(test)

test_2 <- .Internal(vapply(test_l, \(x) x^2, numeric(1), FALSE))
# Error: '...' used in an incorrect context

Compare this to the code of the user visible vapply:

function (X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) 
{
  FUN <- match.fun(FUN)
  if (!is.vector(X) || is.object(X)) 
    X <- as.list(X)
  .Internal(vapply(X, FUN, FUN.VALUE, USE.NAMES))
}

Can someone explain to me why this does not work?

Some of my hypotheses:

Does it run in another namespace than the function body of the user visible vapply or do I call the user visible function instead of the .Internal for some other reason?

Is it because the internal vapply is a special internal, that works differently than other internals?

The R-ints manual [1] states:

2.2 Special internals

There are also special .Internal functions: NextMethod, Recall, withVisible, cbind, rbind (to allow for the deparse.level argument), eapply, lapply and vapply.

But does not give more detail.

Can someone give me more detail on those "Special internals"?

[1] https://cran.r-project.org/doc/manuals/R-ints.html#g_t_002eInternal-vs-_002ePrimitive

CodePudding user response：

The C code refers to R_DotsSymbol, i.e., the ellipses. The calling scope does not have the ellipses. It can't because it is not a function call.

You can reproduce the error using standard R code like this:

foo <- function() {
  return(...)
}
foo()
#Error in foo() : '...' used in an incorrect context

Or with the .Internal call to vapply like this:

bar <- function(X, FUN, FUN.VALUE, USE.NAMES) {
  .Internal(vapply(X, FUN, FUN.VALUE, USE.NAMES))
}

bar(test_l, \(x) x^2, numeric(1), FALSE)
#Error in bar(test_l, function(x) x^2, numeric(1), FALSE) : 
#  '...' used in an incorrect context

This works because ... exists in the calling scope:

baz <- function(X, FUN, FUN.VALUE, USE.NAMES, ...) {
  .Internal(vapply(X, FUN, FUN.VALUE, USE.NAMES))
}

x <- baz(test_l, \(x) x^2, numeric(1), FALSE)

You won't be able to produce significant faster code by skipping those first few lines of vapply. They are not your bottleneck. It might help implementing the function that is repeatedly called by vapply with Rcpp but a true performance boost can only be achieved by implementing the whole loop with Rcpp. Calls to R closures are expensive and you want to avoid them in loops with many iterations.

CodePudding user response：

Roland's analysis is correct here. There is a hack that allows you to get an ellipsis in the global environment, but it requires the function you pass to vapply to take an extra unused argument:

`...` <- (function(...) get("..."))(y = 2)

So now you could do:

test <- rnorm(10)
test_l <- as.list(test)

.Internal(vapply(test_l, \(x, y) x^2, numeric(1), FALSE))
#>  [1] 1.49370808 0.02969854 4.80764382 2.96895104 0.69506047 1.53488883
#>  [7] 0.12566700 1.27180579 0.08399010 0.02366073

However, this is not recommended. Although there is a very small overhead to calling the .Internal from inside a closure, as Roland says, this is not going to be the rate-limiting factor in your code.

If we measure it:

microbenchmark::microbenchmark(
  hack = {`...` <- (function(...) get("..."))(y = 2);
   .Internal(vapply(test_l, \(x, y) x^2, numeric(1), FALSE))},
  standard = vapply(test_l, \(x) x^2, numeric(1), USE.NAMES = FALSE))

#> Unit: microseconds
#>      expr min  lq   mean median  uq   max neval cld
#>      hack 9.0 9.3  9.690    9.6 9.8  17.8   100   a
#>  standard 6.7 7.0 15.261    7.1 7.2 817.9   100   a

We can see that although the hack is slightly faster on average (and only due to the occasional outlier in the standard version), it is in the order of 5 microseconds per call, so you might save yourself 5 milliseconds if you call this routine 1000 times. When you consider the opacity and difficulty of debugging such an approach, it is simply not worth it.

^{Created on 2022-11-08 with reprex v2.0.2}