Using Purrr::map2 or pmap to avoid for loops-CodePudding

I'm desperately trying to avoid for loops to calculate custom financial indicators (multiple stocks, 5,000 rows per stock). I'm trying to use purrr::map2, and it is fine when doing math on existing vectors, but I need to reference the lag (previous) value of the vector I'm trying to create. Without referencing a previous value, purrr::map2 works fine:

some_function <- function(a, b) {   (a * b)   ((1 - a) * b)  }
a <- c(0.019, 0.026, 0.012, 0.022)  # some indicator
b <- c(15.5, 16.7, 14.8, 13.1)  # close price
purrr::map2(a, b, some_function)

which just results in the original close values

15.5, 16.7, 14.8, 13.1

But what I'm really trying to do is create a new vector (c), that looks back on itself (lag) as part of the calculation. If it is the first row, c == b, otherwise:

desired_function <- function(a, b, c) {   (a * b)   ((1 - a) * lag(c))  }

So I create a vector c and populate and try:

c <- c(15.5, 0, 0, 0)
purrr::map2(a, b, c, desired_function)

And get all NULL values, obviously.
Values for c should be: 15.50, 15.53, 15.52, 15.47

Referencing a previous value is a common thing among indicators, and it forces me to go to clunky, slow 'for loops'. Any suggestions are greatly appreciated.

CodePudding user response：

If calculating a certain value in a vector requires another value from the same vector, then it just can't be vectorized; you'll have to calculate them one after another.

For loops aren't slow by themselves; it's how you use them. For instance, retrieving values from a data frame one value at a time, or inserting them one value at a time, is a common practice that is very slow.

The implementation of for-loops in R has improved a lot in the past 10 years, alledgedly they used to be less efficient, and in older posts you'll find many people complaining about it.

Recommended reading:

https://www.r-bloggers.com/2018/06/why-loops-are-slow-in-r/

And these two old questions (well, their answers):

Speed up the loop operation in R

Why are loops slow in R?

CodePudding user response：

Here some solutions, first one refers to your idea using stats::lag (using stats::, because the dplyr package always masks lag!),

r <- numeric(4L)
for (i in 1:4) {
  r[i] <- c[i   1] <- a[i]*b[i]   (1 - a[i])*stats::lag(c)[i]
}
r
# [1] 15.50000 15.53120 15.52243 15.46913

and another one using a starting value that updates in every iteration, which is about 20% faster.

r <- numeric(4L)
sval <- 15.5
for (i in 1:4) {
  r[i] <- sval <- a[i]*b[i]   (1 - a[i])*sval
}
r
# [1] 15.50000 15.53120 15.52243 15.46913

Data:

a <- c(0.019, 0.026, 0.012, 0.022)
b <- c(15.5, 16.7, 14.8, 13.1)
c <- c(15.5, 0, 0, 0)