Home > Back-end >  Computing correlation between matrix - R and Python return different results
Computing correlation between matrix - R and Python return different results

Time:03-19

Let's assume matrix X and Y of size 2x3 and 2x2, respectively. Function 'cor' in R returns a 3x2 matrix while function numpy.corrcoef in Python return a 5x5 matrix. Examples below:

R:

X<-matrix(c(0.2,0.5,0.1,0.7,0.5,0.3), nrow=2, ncol=3)
Y<-matrix(c(0.2,0.3,0.6,0.7), nrow=2)
cor(X,Y)
     [,1] [,2]
[1,]    1    1
[2,]    1    1
[3,]   -1   -1

Python:

X = np.array([[0.2,0.5], [0.1, 0.7], [0.5,0.3]], ndmin=2).T
Y = np.array([[0.2,0.3],[0.6,0.7]], ndmin=2).T
corr = np.corrcoef(X, Y, rowvar=False)

array([[ 1.,  1., -1.,  1.,  1.],
       [ 1.,  1., -1.,  1.,  1.],
       [-1., -1.,  1., -1., -1.],
       [ 1.,  1., -1.,  1.,  1.],
       [ 1.,  1., -1.,  1.,  1.]])

How to get python to return a 3x2 matrix like in R ? Or how should I select the correct values in Python's 5x5 matrix so it matches R's result ?

CodePudding user response:

In R, when x and y are matrices, cor(x,y) will return correlation of columns of x (n=3) with columns of y (n=2). In python, you can slice the result of np.corrcoef() using the correct indices, which in this case are 3, and 2, for x (rows) and y (columns), respectively, of the result.

np.corrcoef(X,Y)[0:3,0:2]

array([[ 1.,  1.],
       [ 1.,  1.],
       [-1., -1.]])
  • Related