I am working with stock data coming from Yahoo Finance.
def load_y_finance_data(y_finance_tickers: list):
df = pd.DataFrame()
print("Loading Y-Finance data ...")
for ticker in y_finance_tickers:
df[ticker.replace("^", "")] = yf.download(
ticker,
auto_adjust=True, # only download adjusted data
progress=False,
)["Close"]
print("Done loading Y-Finance data!")
return df
x = load_y_finance_data(["^VIX", "^GSPC"])
x
VIX GSPC
Date
1990-01-02 17.240000 359.690002
1990-01-03 18.190001 358.760010
1990-01-04 19.219999 355.670013
1990-01-05 20.110001 352.200012
1990-01-08 20.260000 353.790009
DataSize=(8301, 2)
Here I want to perform a sliding window operation for every 50 days period, where I want to get correlation (using corr() function) for 50 days slice (day_1 to day_50) of data and after window will move by one day (day_2 to day_51) and so on.
I tried the naive way of using a for loop to do this and it works as well. But it takes too much time. Code below-
data_size = len(x)
period = 50
df = pd.DataFrame()
for i in range(data_size-period):
df.loc[i, "GSPC_VIX_corr"] = x[["GSPC", "VIX"]][i:i period].corr().loc["GSPC", "VIX"]
df
GSPC_VIX_corr
0 -0.703156
1 -0.651513
2 -0.602876
3 -0.583256
4 -0.589086
How can I do this more efficiently? Is there any built-in way I can use? Thanks :)
CodePudding user response:
You can use the rolling windows functionality of Pandas with many different aggreggations, including corr(). Instead of your for loop, do this:
x["VIX"].rolling(window=period).corr(x["GSPC"])