Home > OS >  How to slice and calculate the pearson correlation coefficient between one big and small array with
How to slice and calculate the pearson correlation coefficient between one big and small array with

Time:01-25

Suppose I have two very simple arrays with numpy:

import numpy as np
reference=np.array([0,1,2,3,0,0,0,7,8,9,10])
probe=np.zeros(3)

I would like to find which slice of array reference has the highest pearson's correlation coefficient with array probe. To do that, I would like to slice the array reference using some sort of sub-arrays that are overlapped in a for loop, which means I shift one element at a time of reference, and compare it against array probe. I did the slicing using the non elegant code below:

from statistics import correlation
for i in range(0,len(reference)):
  #get the slice of the data 
  sliced_data=reference[i:i len(probe)]
  #only calculate the correlation when probe and reference have the same number of elements 
  if len(sliced_data)==len(probe):
      my_rho = correlation(sliced_data, probe)
      

I have one issues and one question about such a code:

1-once I run the code, I have the error below:

my_rho = correlation(sliced_data, probe)
  File "/usr/lib/python3.10/statistics.py", line 919, in correlation
    raise StatisticsError('at least one of the inputs is constant')
statistics.StatisticsError: at least one of the inputs is constant

2- is there a more elegant way of doing such slicing with python?

CodePudding user response:

You can use sliding_window_view to get the successive values, for a vectorized computation of the correlation, use a custom function:

from numpy.lib.stride_tricks import sliding_window_view as swv

def np_corr(X, y):
    # adapted from https://stackoverflow.com/a/71253141
    denom = (np.sqrt((len(y) * np.sum(X**2, axis=-1) - np.sum(X, axis=-1) ** 2)
                       * (len(y) * np.sum(y**2) - np.sum(y)**2)))
    return np.divide((len(y) * np.sum(X * y[None, :], axis=-1) - (np.sum(X, axis=-1) * np.sum(y))),
                     denom, where=denom!=0
                    )

corr = np_corr(swv(reference, len(probe)), probe)

Output:

array([ 1.        ,  1.        , -0.65465367, -0.8660254 ,  0.        ,
        0.8660254 ,  0.91766294,  1.        ,  1.        ])
  • Related