Essentially, I am trying to perform simple linear regressions on daily stock returns to figure out which stocks have the highest degree of mean reversion. My code pulls in S&P500 daily returns into a data frame, then creates a lagged column for each ticker.
import yfinance as yf
import numpy as np
import pandas as pd
from datetime import date, timedelta
from sklearn.linear_model import LinearRegression
enddt = date.today()
startdt = end - timedelta(days=90)
symbols = ['MMM', 'AOS', 'ABT']
data = yf.download(" ".join(symbols), start= startdt,end=enddt)
daily_returns = data['Adj Close'].pct_change()
df2 = pd.DataFrame(daily_returns)
for symbol in symbols:
df2[f'{symbol}_lag'] = df2[symbol].shift(1)
df3 = df2.drop(df2.index[[0,1]])
display(df3.head())
I started to a basic linear regression:
x = np.array(df3.MMM).reshape((-1,1))
y = np.array(df3.MMM_lag)
model = LinearRegression().fit(x,y)
print(f"R^2: {model.score_}")
print(f"intercept: {model.intercept_}")
print(f"slope: {model.coef_}")
The code above works, but ideally, I would like to pull in 400 tickers, and I don't want to type out each regression.
CodePudding user response:
You can use a for
loop to fit multiple linear regression models and slice your pd.DataFrame
to the relevant columns using []
.
I simplified your code a little bit, as it contained variables that were not defined.
import yfinance as yf
import numpy as np
import pandas as pd
from datetime import date, timedelta
from sklearn.linear_model import LinearRegression
enddt = date.today()
startdt = enddt - timedelta(days=90)
symbols = ["MMM", "AOS", "ABT"]
data = yf.download(" ".join(symbols), start=startdt, end=enddt)
# construct feature matrix and target
X = data["Adj Close"].pct_change()
y = X.shift(1)
# drop first two rows
X = X.iloc[2:, :]
y = y.iloc[2:, :]
for symbol in symbols:
X_sym = X[symbol].values.reshape(-1, 1)
y_sym = y[symbol]
model = LinearRegression().fit(X_sym, y_sym)
print(f"R^2: {model.score(X_sym,y_sym)}")
print(f"intercept: {model.intercept_}")
print(f"slope: {model.coef_}")
Output:
[*********************100%***********************] 3 of 3 completed
R^2: 1.291986384543975e-08
intercept: 0.0016325991063324728
slope: [0.00011383]
R^2: 0.0015475148976311637
intercept: 0.00237810444818736
slope: [0.03956798]
R^2: 0.001977249242221757
intercept: 0.0017093384700974568
slope: [0.04447145]