numpy covariance calculation (einsum potential?)-CodePudding

I am trying to write out a covariance calculation for the following example, and i know there has to be a better way than a for loop. I've looked into np.dot, np.einsum and i feel like np.einsum has the capability but i am just missing something for implementing it.

import numpy as np

# this is mx3
a = np.array([[1,2,3],[4,5,6]])
# this is x3
mean = a.mean(axis=0)

# result should be 3x3
b = np.zeros((3,3))
for i in range(a.shape[0]):
    b = b   (a[i]-mean).reshape(3,1) * (a[i]-mean)

b
array([[4.5, 4.5, 4.5],
       [4.5, 4.5, 4.5],
       [4.5, 4.5, 4.5]])

so this is fine for a 2 data point sample but for a m=large number this is super slow. There has to be a better way. Any suggestions?

CodePudding user response：

In [108]: a = np.array([[1,2,3],[4,5,6]])
     ...: # this is x3
     ...: mean = a.mean(axis=0)
     ...: 
     ...: # result should be 3x3
     ...: b = np.zeros((3,3))
     ...: for i in range(a.shape[0]):
     ...:     b = b   (a[i]-mean).reshape(3,1) * (a[i]-mean)
     ...: 
In [109]: b
Out[109]: 
array([[4.5, 4.5, 4.5],
       [4.5, 4.5, 4.5],
       [4.5, 4.5, 4.5]])
In [110]: a.mean(axis=0)
Out[110]: array([2.5, 3.5, 4.5])

Since the mean is subtracted twice, lets define a new variable. In this case the 2d and 1d dimensions broadcast, so we can simply:

In [111]: a1= a - a.mean(axis=0)
In [112]: a1
Out[112]: 
array([[-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5]])

The rest is a normal dot product:

In [113]: a1.T@a1
Out[113]: 
array([[4.5, 4.5, 4.5],
       [4.5, 4.5, 4.5],
       [4.5, 4.5, 4.5]])

np.einsum and np.dot can also do this matrix multiplication.