I am looking to speed-up the following mean autocorrelation code that uses standard autocorrelation function in numpy. Is it possible to vectorize this?
input: X -> shape(100,3000) = 100 day time series of 3000 items
output: float -> find the mean(across the 3000 items) of the lag5 autocorrelation of each item
import numpy as np
import time
A = np.random.rand(100,3000)
start_time = time.time()
np.nanmean(np.array([np.corrcoef(np.array([A[:-5,i],A[5:,i]]))[1,0] for i in range(A.shape[1])]),axis=0)
print("--- %s seconds ---" % (time.time() - start_time))
--- 0.35528588 seconds ---
CodePudding user response:
I assume you mainly care about smaller runtime and less so about how it is achieved. The following approach achieves a more than 3x speedup by improving the calculations of the correlation coefficients.
More specifically, it replaces calls to np.corrcoef
with calls to a custom function fast_corrcoef
. In essence, fast_corrcoef
follows the numpy approach for calculating the correlation coefficient, except it does not perform a lot of the redundant computation. Mainly, it exploits the fact that for each call of np.corrcoef
, which calls np.cov
to compute the covariance matrix, which requires computing the average of both series S[:-lag]
and S[lag:]
, does a lot of unneccessary computation: Instead of computing the averages separately as in np.cov
, fast_corrcoef
computes them by first calculating the sum of S[lag:-lag]
and then calculating the averages of the two series using that intermediate sum. For time series of size n
, this yields only (n - 2 * lag) 2 * lag 2 = n 2
additions as opposed to 2n
additions, i.e. essentially saving 50%
of all additions for computing the average, in case of A.shape = (100, 3000)
and lag = 5
amounting to (2 * (100 - 5) - (100 - 5 2)) * 3000 = 279000
saved additions.
Further, since the covariance matrix is symmetric, a full matrix multiplication when calculating it is not required and thus the call to np.dot
as the case with np.cov
has also been replaced by an expression which does only the tree vector dot products that are actually necessary, i.e. calculates the covariance only once instead of twice.
import numpy as np
import time
def fast_corrcoef(X, lag):
n = X.shape[1]
mid_sum = X[0, lag:].sum()
X -= np.array([(mid_sum X[0, :lag].sum()) / n, (mid_sum X[1, -lag:].sum()) / n])[:, None]
return (X[0,:] * X[1,:]).sum() / (np.sqrt((X[0,:] ** 2).sum()) * np.sqrt((X[1,:] ** 2).sum()))
A = np.random.rand(100,3000)
lag = 5
def orig(A):
return np.nanmean(np.array([np.corrcoef(np.array([A[:-5,i],A[5:,i]]))[1,0] for i in range(A.shape[1])]),axis=0)
def faster(A, lag):
corrcoefs = np.empty([A.shape[1]])
for i in range(A.shape[1]):
corrcoefs[i] = fast_corrcoef(np.array([A[:-lag, i], A[lag:, i]]), lag)
return np.nanmean(corrcoefs, axis=0)
start_time = time.time()
res_orig = orig(A)
runtime_orig = time.time() - start_time
print(f'orig: runtime: {runtime_orig:.4f}s; result: {res_orig}')
start_time = time.time()
res_faster = faster(A, lag)
runtime_faster = time.time() - start_time
print(f'fast: runtime: {runtime_faster:.4f}s; result: {res_faster}')
assert abs(res_orig - res_faster) < 1e-16
speedup = runtime_orig / runtime_faster
print(f'speedup: {speedup:.4f}x')
This code printed the following (when I ran it in Google Colab):
orig: runtime: 0.4214s; result: -0.00961872402216293
fast: runtime: 0.1138s; result: -0.009618724022162932
speedup: 3.7030x