I currently have a dict Object called xrz_data comprised of different stock data (around 100 different tickers). The structure is as following:
xrz_data
{'AAPL': Open High Low Close Adj Close \
Date
2021-12-31 178.089996 179.229996 177.259995 177.570007 176.838242
2022-01-03 177.830002 182.880005 177.710007 182.009995 181.259918
2022-01-04 182.630005 182.940002 179.119995 179.699997 178.959442
2022-01-05 179.610001 180.169998 174.639999 174.919998 174.199158
2022-01-06 172.699997 175.300003 171.639999 172.000000 171.291183
... ... ... ... ... ...
2022-10-13 134.990005 143.589996 134.369995 142.990005 142.990005
2022-10-14 144.309998 144.520004 138.190002 138.380005 138.380005
2022-10-17 141.070007 142.899994 140.270004 142.410004 142.410004
2022-10-18 145.490005 146.699997 140.610001 143.750000 143.750000
2022-10-19 141.690002 144.949997 141.500000 143.860001 143.860001
[202 rows x 6 columns],
'ABBV': Open High Low Close Adj Close \
Date
2021-12-31 136.039993 136.210007 135.300003 135.399994 130.322540
2022-01-03 135.410004 135.699997 133.509995 135.419998 130.341797
2022-01-04 135.330002 136.220001 134.380005 135.160004 130.091568
2022-01-05 135.000000 138.149994 135.000000 135.869995 130.774918
2022-01-06 136.399994 136.660004 135.160004 135.229996 130.158936
... ... ... ... ... ...
2022-10-13 136.699997 143.179993 136.270004 142.919998 142.919998
2022-10-14 142.600006 144.479996 142.210007 142.940002 142.940002
2022-10-17 142.649994 144.929993 142.100006 144.410004 144.410004
2022-10-18 145.229996 145.869995 143.529999 144.600006 144.600006
2022-10-19 144.800003 145.449997 142.309998 143.130005 143.130005
.....
Now I want to calculate the correlation between the close columns of each ticker. I am getting stuck on creating the loop to cycle through the dict object to calculate the correlation.
I have tried using this:
temp_data = {}
for ticker in xrz_data:
temp_data[ticker] = xrz_data[ticker].Close.values
for ticker in xrz_data:
temp_data_2 = xrz_data[ticker].Close
print (np.corrcoef(temp_data, temp_data_2)[0, 1])
However here I am getting the error: ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 202
As an addiction the below code works fine my problem is only the construction of the loop to integrate the below code for each different ticker
np.corrcoef(xrz_data['AAPL'].Close, xrz_data['MSFT'].Close)[0, 1]
output: 0.8640391277249728
CodePudding user response:
In your call to np.corrcoef
, you are comparing temp_data
, which is a dictionary, with temp_data_2
, which is a Series.
Minimal change here:
results = {}
for ticker1 in xrz_data:
for ticker2 in xrz_data:
corr = np.corrcoef(xrz_data[ticker1].Close, xrz_data[ticker2].Close)[0, 1]
print(f"Correlation between {ticker1} and {ticker2}: {corr}")
results[(ticker1, ticker2)] = corr