I am currently dealing with a benchmarking problem and I am willing to use the vectorization of R for faster calculation however I do not really have a clue how I can improve the speed. Help is much appreciated.
function(n = 5, lower = 1, upper = 4, add = 1) {
result <- c(lower, upper)
for (i in 3:n) {
result <- append(result, result[[i - 1]] result[[i - 2]] add)
}
result
}
My ideas included lapply/vapply as well as some sort of recursion.
CodePudding user response:
Don't use append
in a loop. This is called "growing an object" where the results
object gets bigger every iteration. It's notoriously inefficient because as the object gets bigger your computer has to find bigger and bigger places to store it in memory, moving it around and copying it a lot.
Instead, initialize result
to its full length from the start. Set all the values you don't know to NA
and fill them in with values as you go.
# original
foo = function(n = 5, lower = 1, upper = 4, add = 1) {
result <- c(lower, upper)
for (i in 3:n) {
result <- append(result, result[[i - 1]] result[[i - 2]] add)
}
result
}
foo()
bar = function(n = 5, lower = 1, upper = 4, add = 1) {
# initialize to full length
result = integer(length = n)
# set first two entries
result[1:2] <- c(lower, upper)
for (i in 3:n) {
# fill in the rest of the blanks
result[i] <- result[i - 1] result[i - 2] add
}
result
}
## same result
identical(foo(), bar())
# [1] TRUE
## about 40x faster when n = 1000 (looking at the iterations per second)
bench::mark(foo(n = 1000), bar(n = 1000))
# # A tibble: 2 × 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list>
# 1 foo(n = 1000) 1.73ms 1.95ms 497. 3.86MB 39.3 177 14 356ms <dbl [1…
# 2 bar(n = 1000) 51.87µs 53.46µs 18439. 11.81KB 4.13 8936 2 485ms <dbl [1…
# # … with 3 more variables: memory <list>, time <list>, gc <list>
Also note that with vectors you only need single brackets [
. Use double brackets [[
to extract a single item from a list
class object.
CodePudding user response:
First of all, do not use recursion
, which slows down your performance. Also, you can use pre-allocated vector to store the updated values. Below is a benchmark
# OP's solution
f <- function(n = 10, lower = 1, upper = 4, add = 1) {
result <- c(lower, upper)
for (i in 3:n) {
result <- append(result, result[[i - 1]] result[[i - 2]] add)
}
result
}
# A recursion implementation
f1 <- function(n = 10, lower = 1, upper = 4, add = 1) {
if (n <= 2) {
return(c(lower, upper)[1:n])
}
v <- Recall(n - 1)
c(v, sum(tail(v, 2)) add)
}
# for-loop version with pre-allocated vector
f2 <- function(n = 10, lower = 1, upper = 4, add = 1) {
v <- numeric(n)
for (i in 1:n) {
if (i <= 2) {
v[i] <- c(lower, upper)[i]
} else {
v[i] <- v[i - 1] v[i - 2] add
}
}
v
}
and you will see
> microbenchmark(f(), f1(), f2())
Unit: microseconds
expr min lq mean median uq max neval
f() 10.5 11.0 150.894 11.60 12.30 13738.9 100
f1() 68.1 69.3 170.973 70.95 82.25 6796.3 100
f2() 2.7 2.9 163.506 3.20 3.80 15966.3 100