Vectorization for computing variance of a vector split at different points-CodePudding

I have a 1-D array arr and I need to compute the variance of all possible contiguous subvectors that begin at position 0. It may be easier to understand with a for loop:

np.random.seed(1)
arr = np.random.normal(size=100)

res = []   
for i in range(1, arr.size 1):
    subvector = arr[:i]
    var = np.var(subvector)
    res.append(var)

Is there any way to compute res witouth the for loop?

CodePudding user response：

Yes, since var = sum_squares / N - mean**2, and mean = sum /N, you can do cumsum to get the accumulate sums:

cumsum = np.cumsum(arr)
cummean = cumsum/(np.arange(len(arr))   1)
sq = np.cumsum(arr**2)

# correct the dof here
cumvar = sq/(np.arange(len(arr)) 1) - cummean**2

np.allclose(res, cumvar)
# True

CodePudding user response：

With pandas, you could use expanding:

import pandas as pd
pd.Series(arr).expanding().var(ddof=0).values

NB. one of the advantages is that you can benefit from the var parameters (by default ddof=1), and of course, you can run many other methods.