Fast Rolling Correlation in R-CodePudding

I'm trying to perform rolling correlation on a data with row N, where N is greater than 600000. This is a stock dataset, where each row represents the value of the stock at that minute, so each row differs by one minute. This link is an idea of how the code should be implemented.

Nonetheless, since there are so many rows in my dataset, it runs extremely slow. I was wondering if there is any other possible solutions. I thought of: since the window differs by just only a minute, maybe I can somehow update the mean and standard deviation dynamically, so that the cor function do not need to take in the whole window of code again. I personally think that this might not be a good way since mean is easy to update, but for sd, the whole cols will need to be taken into account again.

Would appreciate any help! Thanks!

CodePudding user response：

I don't know what "extremely slow" means to you or how fast you need, but here's a solution that works on some simple fake data that's 600,000 rows in about 20 seconds on a 2012 laptop, running the rolling correlation over windows of 60 observations.

df <- data.frame(change_a = rnorm(6E5), change_b = rnorm(6E5))
df$price_a = 100   cumsum(df$change_a)
df$price_b = 100   cumsum(df$change_b)
df$corr100 = slider::slide2_dbl(df$price_a, df$price_b, cor, .before = 59)

You could also look at the slide2_index_dbl function if you want the rolling window to be defined by timestamp instead of a fixed number of observations.