Home > Back-end >  Python Numpy Conv confussion
Python Numpy Conv confussion

Time:09-23

I am trying to calculate the covariance matrix of two vectors a and b for which i am using numpys cov implementation R = np.cov(a,b). I got a little confussed when i noticed that np.cov(a,b)[0,0] != np.var(a) however i was able to find that, that had to do with biased vs unbiased estimators and is controlled by ddof.

However, that isn't the end of it. Why is R[0,1] != R[0,0]**0.5 * R[1,1]**0.5. Following my understanding and the definition of the covariance matrix on wikipedia https://en.wikipedia.org/wiki/Covariance_matrix

R[0,1] = R[1,0] = std(a) * std(b)
R[0,1] = R[1,0] = var(a)**0.5 * var(b)**0.5
R[0,1] = R[1,0] = R[0,0]**0.5 * R[1,1]**0.5

Where am i mistaken?

import numpy as np

rng = np.random.default_rng(seed=43)
a = rng.random((1,3))
b = rng.random((1,3))

R = np.cov(a,b,ddof=1)

print(R)
print('var a: '     str(np.var(a,ddof=1)))
print('var b: '     str(np.var(b,ddof=1)))
print('cov a,b: '   str(np.var(a,ddof=1)**0.5*np.var(b,ddof=1)**0.5))
print('cov a,b: '   str(R[0,0]**0.5*R[1,1]**0.5))
print('cov a,b: '   str(np.std(a,ddof=1)*np.std(b,ddof=1)))

I apologist in advance for any spelling or stack overflow ethnic mistakes on my part. Any help is appreciated.

CodePudding user response:

I'm not sure where the formula var(a)**0.5 * var(b)**0.5 is coming from, but it's not the formula I've seen for cross-covariance. I've seen this as the expected value of the products of x - mean_of_x and y - mean_of_y.

In a loop style (for clarity) this might look like:

a_mean = np.mean(a)
b_mean = np.mean(b)

s = 0
n = len(a[0])
for i, _ in enumerate(a[0]):
    s  = (a[0][i] - a_mean) * (b[0][i] - b_mean)
    
s / (n-1)
# 0.09175517729176114

In Numpy you can also do:

a_mean = np.mean(a)
b_mean = np.mean(b)

(a - a_mean) @ (b - b_mean).T / (n-1)
# array([[0.09175518]])

This corresponds to the values you get in the corners.

If you want to divide by n rather than n-2, you can pass in the bias arg to cov()

np.cov(a, b, bias=True)
# array([[0.08562558, 0.06117012],
         [0.06117012, 0.06361328]])

The corners here are the results you will get with the above code by dividing the results by 3 (n) rather than 2 (n-1)

  • Related