Home > Mobile >  Calculate correlation between different datasets of dict object
Calculate correlation between different datasets of dict object


I currently have a dict Object called xrz_data comprised of different stock data (around 100 different tickers). The structure is as following:


{'AAPL':                   Open        High         Low       Close   Adj Close  \
 2021-12-31  178.089996  179.229996  177.259995  177.570007  176.838242   
 2022-01-03  177.830002  182.880005  177.710007  182.009995  181.259918   
 2022-01-04  182.630005  182.940002  179.119995  179.699997  178.959442   
 2022-01-05  179.610001  180.169998  174.639999  174.919998  174.199158   
 2022-01-06  172.699997  175.300003  171.639999  172.000000  171.291183   
 ...                ...         ...         ...         ...         ...   
 2022-10-13  134.990005  143.589996  134.369995  142.990005  142.990005   
 2022-10-14  144.309998  144.520004  138.190002  138.380005  138.380005   
 2022-10-17  141.070007  142.899994  140.270004  142.410004  142.410004   
 2022-10-18  145.490005  146.699997  140.610001  143.750000  143.750000   
 2022-10-19  141.690002  144.949997  141.500000  143.860001  143.860001

 [202 rows x 6 columns],
 'ABBV':                   Open        High         Low       Close   Adj Close  \
 2021-12-31  136.039993  136.210007  135.300003  135.399994  130.322540   
 2022-01-03  135.410004  135.699997  133.509995  135.419998  130.341797   
 2022-01-04  135.330002  136.220001  134.380005  135.160004  130.091568   
 2022-01-05  135.000000  138.149994  135.000000  135.869995  130.774918   
 2022-01-06  136.399994  136.660004  135.160004  135.229996  130.158936   
 ...                ...         ...         ...         ...         ...   
 2022-10-13  136.699997  143.179993  136.270004  142.919998  142.919998   
 2022-10-14  142.600006  144.479996  142.210007  142.940002  142.940002   
 2022-10-17  142.649994  144.929993  142.100006  144.410004  144.410004   
 2022-10-18  145.229996  145.869995  143.529999  144.600006  144.600006   
 2022-10-19  144.800003  145.449997  142.309998  143.130005  143.130005 


Now I want to calculate the correlation between the close columns of each ticker. I am getting stuck on creating the loop to cycle through the dict object to calculate the correlation.

I have tried using this:

temp_data = {}

for ticker in xrz_data:
    temp_data[ticker] = xrz_data[ticker].Close.values
    for ticker in xrz_data:
        temp_data_2 = xrz_data[ticker].Close
        print (np.corrcoef(temp_data, temp_data_2)[0, 1])

However here I am getting the error: ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 202

As an addiction the below code works fine my problem is only the construction of the loop to integrate the below code for each different ticker

np.corrcoef(xrz_data['AAPL'].Close, xrz_data['MSFT'].Close)[0, 1]

output: 0.8640391277249728

CodePudding user response:

In your call to np.corrcoef, you are comparing temp_data, which is a dictionary, with temp_data_2, which is a Series.

Minimal change here:

results = {}

for ticker1 in xrz_data:
    for ticker2 in xrz_data:
        corr = np.corrcoef(xrz_data[ticker1].Close, xrz_data[ticker2].Close)[0, 1]
        print(f"Correlation between {ticker1} and {ticker2}: {corr}")
        results[(ticker1, ticker2)] = corr
  • Related