I'm desperately trying to avoid for loops to calculate custom financial indicators (multiple stocks, 5,000 rows per stock). I'm trying to use purrr::map2
, and it is fine when doing math on existing vectors, but I need to reference the lag (previous) value of the vector I'm trying to create. Without referencing a previous value, purrr::map2
works fine:
some_function <- function(a, b) { (a * b) ((1 - a) * b) }
a <- c(0.019, 0.026, 0.012, 0.022) # some indicator
b <- c(15.5, 16.7, 14.8, 13.1) # close price
purrr::map2(a, b, some_function)
which just results in the original close values
15.5, 16.7, 14.8, 13.1
But what I'm really trying to do is create a new vector (c), that looks back on itself (lag) as part of the calculation. If it is the first row, c == b, otherwise:
desired_function <- function(a, b, c) { (a * b) ((1 - a) * lag(c)) }
So I create a vector c
and populate and try:
c <- c(15.5, 0, 0, 0)
purrr::map2(a, b, c, desired_function)
And get all NULL values, obviously.
Values for c should be: 15.50, 15.53, 15.52, 15.47
Referencing a previous value is a common thing among indicators, and it forces me to go to clunky, slow 'for loops'. Any suggestions are greatly appreciated.
CodePudding user response:
If calculating a certain value in a vector requires another value from the same vector, then it just can't be vectorized; you'll have to calculate them one after another.
For
loops aren't slow by themselves; it's how you use them. For instance, retrieving values from a data frame one value at a time, or inserting them one value at a time, is a common practice that is very slow.
The implementation of for-loops in R has improved a lot in the past 10 years, alledgedly they used to be less efficient, and in older posts you'll find many people complaining about it.
Recommended reading:
https://www.r-bloggers.com/2018/06/why-loops-are-slow-in-r/
And these two old questions (well, their answers):
Speed up the loop operation in R
CodePudding user response:
Here some solutions, first one refers to your idea using stats::lag
(using stats::
, because the dplyr package always masks lag
!),
r <- numeric(4L)
for (i in 1:4) {
r[i] <- c[i 1] <- a[i]*b[i] (1 - a[i])*stats::lag(c)[i]
}
r
# [1] 15.50000 15.53120 15.52243 15.46913
and another one using a starting value that updates in every iteration, which is about 20% faster.
r <- numeric(4L)
sval <- 15.5
for (i in 1:4) {
r[i] <- sval <- a[i]*b[i] (1 - a[i])*sval
}
r
# [1] 15.50000 15.53120 15.52243 15.46913
Data:
a <- c(0.019, 0.026, 0.012, 0.022)
b <- c(15.5, 16.7, 14.8, 13.1)
c <- c(15.5, 0, 0, 0)