I'm trying to gain a deeper understanding of loops vs. *apply functions in R. Here, I did an experiment where I compute the first 10,000 triangular numbers in 3 different ways.
unwrapped
: a simple for loopwrapped
: I take the exact same loop from before, but wrap it in a function.vapply
: Usingvapply
and an anonymous function.
The results surprised me in two different ways.
- Why is
wrapped
8x faster thanunwrapped
(?!?!) My intuition is that givenwrapped
actually does more stuff (defines a function and then calls it), it should have been slower. - Why are they both so much faster than vapply? I would have expected vapply to be able to do some kind of optimization that performs at least as well as the loops.
microbenchmark::microbenchmark(
unwrapped = {
x <- numeric(10000)
for (i in 1:10000) {
x[i] <- i * (i 1) / 2
}
x
},
wrapped = {
tri_nums <- function(n) {
x <- numeric(n)
for (i in 1:n) {
x[i] <- i * (i 1) / 2
}
x
}
tri_nums(10000)
},
vapply = vapply(1:10000, \(i) i * (i 1) / 2, numeric(1)),
check = 'equal'
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> unwrapped 2652.487 3006.888 3445.896 3150.7555 3832.094 7029.949 100
#> wrapped 398.534 414.010 455.333 439.7445 469.307 656.074 100
#> vapply 4942.000 5154.639 5937.333 5453.2880 5969.760 13730.718 100
Created on 2023-01-04 with reprex v2.0.2
CodePudding user response:
It's byte-compiling your function.
We can confirm just-in-time (JIT) compilation with:
compiler::enableJIT(-1)
# [1] 3 # <--- this is the previous JIT level
where negative returns the current level unchanged, and a value of 3 means highest JIT compiling level. I'm not certain what steps each level is doing, but we can make a simple test to compare them. (See ?enableJIT
for more info.)
compiler::enableJIT(0)
# [1] 3
tri_nums <- function(n) {
x <- numeric(n)
for (i in 1:n) {
x[i] <- i * (i 1) / 2
}
x
}
bench::mark(
unwrapped = {
x <- numeric(10000)
for (i in 1:10000) {
x[i] <- i * (i 1) / 2
}
x
},
JIT0 = tri_nums(10000),
vapply = vapply(1:10000, \(i) i * (i 1) / 2, numeric(1))
)
# # A tibble: 3 × 13
# expression min median `itr/sec` mem_al…¹ gc/se…² n_itr n_gc total…³ result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:by> <dbl> <int> <dbl> <bch:t> <list> <list> <list> <list>
# 1 unwrapped 8.21ms 8.7ms 113. 78.2KB 7.07 48 3 424ms <dbl> <Rprofmem> <bench_tm> <tibble>
# 2 JIT0 7.26ms 7.72ms 128. 78.2KB 9.84 52 4 407ms <dbl> <Rprofmem> <bench_tm> <tibble>
# 3 vapply 5.97ms 6.5ms 152. 78.2KB 9.51 64 4 421ms <dbl> <Rprofmem> <bench_tm> <tibble>
# # … with abbreviated variable names ¹mem_alloc, ²`gc/sec`, ³total_time
(I can't put all three levels in at once, since I believe the JIT check is done when we call it as well as when we define it. I'm really not qualified to speak to this level of R-internal, so ... please correct me and/or add amplifying information.)
Doing this again for levels 1-3 and copy/pasting the relevant bench::mark
rows, we see:
# # A tibble: 3 × 13
# expression min median `itr/sec` mem_al…¹ gc/se…² n_itr n_gc total…³ result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:by> <dbl> <int> <dbl> <bch:t> <list> <list> <list> <list>
# 1 unwrapped 8.21ms 8.7ms 113. 78.2KB 7.07 48 3 424ms <dbl> <Rprofmem> <bench_tm> <tibble>
# 2 JIT0 7.26ms 7.72ms 128. 78.2KB 9.84 52 4 407ms <dbl> <Rprofmem> <bench_tm> <tibble>
# 2 JIT1 419.6µs 502.5µs 1923. 108.7KB 0 962 0 500ms <dbl> <Rprofmem> <bench_tm> <tibble>
# 2 JIT2 413.4µs 494.3µs 1971. 108.7KB 0 986 0 500ms <dbl> <Rprofmem> <bench_tm> <tibble>
# 2 JIT3 426.7µs 498.3µs 1981. 108.7KB 0 991 0 500ms <dbl> <Rprofmem> <bench_tm> <tibble>
# 3 vapply 5.97ms 6.5ms 152. 78.2KB 9.51 64 4 421ms <dbl> <Rprofmem> <bench_tm> <tibble>
# # … with abbreviated variable names ¹mem_alloc, ²`gc/sec`, ³total_time
showing that the vast majority of gains are in the first level of byte-compiling (not too surprising given the simplicity of this function).
Note: for anybody who is actually testing some of this code, you might want to ensure you're back at the default level of 3:
compiler::enableJIT(3)