Concatenation axis must match exactly for np.corrcoef-CodePudding

I have 2 numpy arrays. x is a 2-d array with 9 features/columns and 536 rows and y is a 1-d array with 536 rows. demonstrated below

>>> x.shape
(536, 9)
>>> y.shape
(536,)

I am trying to find the correlation coefficients between x and y.

>>> np.corrcoef(x,y)

Here's the error I am seeing.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in corrcoef
  File "/opt/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py", line 2683, in corrcoef
    c = cov(x, y, rowvar, dtype=dtype)
  File "<__array_function__ internals>", line 5, in cov
  File "/opt/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py", line 2477, in cov
    X = np.concatenate((X, y), axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 9 and the array at index 1 has size 536

Can't seem to figure out what the shape of these 2 should be.

CodePudding user response：

For same shape is possible use numpy.broadcast_to:

y1 = np.broadcast_to(y[:, None], x.shape)
#alternative solution
y1 = np.repeat(y[:, None], x.shape[1],1)
print (np.corrcoef(x,y1))

Sample:

np.random.seed(1609)
    
x = np.random.random((5,3))
y = np.random.random((5))
print (x)
[[3.28341891e-01 9.10078695e-01 6.25727436e-01]
 [9.52999512e-01 3.54590864e-02 4.19920842e-01]
 [2.46229526e-02 3.60903454e-01 9.96143110e-01]
 [8.87331773e-01 8.34857105e-04 6.36058323e-01]
 [2.91490345e-01 5.01580494e-01 3.23455182e-01]]

print (y)
[0.60437973 0.74687751 0.68819022 0.19104546 0.68420365]

y1 = np.broadcast_to(y[:, None], x.shape)
print(y1)
[[0.60437973 0.60437973 0.60437973]
 [0.74687751 0.74687751 0.74687751]
 [0.68819022 0.68819022 0.68819022]
 [0.19104546 0.19104546 0.19104546]
 [0.68420365 0.68420365 0.68420365]]

print (np.corrcoef(x,y1))
[[ 1.00000000e 00 -9.96776982e-01  3.52933703e-01 -9.66910777e-01
   9.23044315e-01             nan -3.11624591e-16             nan
              nan  3.11624591e-16]
 [-9.96776982e-01  1.00000000e 00 -4.26856227e-01  9.43328464e-01
  -8.89208247e-01             nan  0.00000000e 00             nan
              nan  0.00000000e 00]
 [ 3.52933703e-01 -4.26856227e-01  1.00000000e 00 -1.02557684e-01
  -3.41645099e-02             nan  9.18680698e-17             nan
              nan -9.18680698e-17]
 [-9.66910777e-01  9.43328464e-01 -1.02557684e-01  1.00000000e 00
  -9.90642527e-01             nan -9.92012638e-17             nan
              nan  9.92012638e-17]
 [ 9.23044315e-01 -8.89208247e-01 -3.41645099e-02 -9.90642527e-01
   1.00000000e 00             nan  6.00580887e-16             nan
              nan -6.00580887e-16]
 [            nan             nan             nan             nan
              nan             nan             nan             nan
              nan             nan]
 [-3.11624591e-16  0.00000000e 00  9.18680698e-17 -9.92012638e-17
   6.00580887e-16             nan  1.00000000e 00             nan
              nan -1.00000000e 00]
 [            nan             nan             nan             nan
              nan             nan             nan             nan
              nan             nan]
 [            nan             nan             nan             nan
              nan             nan             nan             nan
              nan             nan]
 [ 3.11624591e-16  0.00000000e 00 -9.18680698e-17  9.92012638e-17
  -6.00580887e-16             nan -1.00000000e 00             nan
              nan  1.00000000e 00]]

CodePudding user response：

@Jezrael pretty much answered my question. An alternate approach would be create a array of zeros with 9 columns and use it correlation coefficient of each feature in x with y. And we do this iteratively.

coeffs = np.zeros(9)

#number of rows
n_features = x.shape[1]
for feature in range(n_features):
  #corrcoef returns a 2-d of shape (2,2) with 1s along the diagonal and coefficient values at 0,1 and 1,0 
  coeff = np.corrcoef(x[:,feature],y)[0,1]
  coeffs[feature] = coeff