Home > Software design >  In Python, is there a way to use a for loop to perform a linear regression multiple times with diffe
In Python, is there a way to use a for loop to perform a linear regression multiple times with diffe

Time:12-30

Essentially, I am trying to perform simple linear regressions on daily stock returns to figure out which stocks have the highest degree of mean reversion. My code pulls in S&P500 daily returns into a data frame, then creates a lagged column for each ticker.

import yfinance as yf
import numpy as np
import pandas as pd

from datetime import date, timedelta
from sklearn.linear_model import LinearRegression

enddt = date.today()
startdt = end - timedelta(days=90)

symbols = ['MMM', 'AOS', 'ABT']

data = yf.download(" ".join(symbols), start= startdt,end=enddt)
daily_returns = data['Adj Close'].pct_change()
df2 = pd.DataFrame(daily_returns)

for symbol in symbols:
    df2[f'{symbol}_lag'] = df2[symbol].shift(1)

df3 = df2.drop(df2.index[[0,1]])

display(df3.head())

I started to a basic linear regression:

x = np.array(df3.MMM).reshape((-1,1))
y = np.array(df3.MMM_lag)

model = LinearRegression().fit(x,y)

print(f"R^2: {model.score_}")
print(f"intercept: {model.intercept_}")
print(f"slope: {model.coef_}")

The code above works, but ideally, I would like to pull in 400 tickers, and I don't want to type out each regression.

CodePudding user response:

You can use a for loop to fit multiple linear regression models and slice your pd.DataFrame to the relevant columns using [].

I simplified your code a little bit, as it contained variables that were not defined.

import yfinance as yf
import numpy as np
import pandas as pd

from datetime import date, timedelta
from sklearn.linear_model import LinearRegression

enddt = date.today()
startdt = enddt - timedelta(days=90)

symbols = ["MMM", "AOS", "ABT"]

data = yf.download(" ".join(symbols), start=startdt, end=enddt)

# construct feature matrix and target
X = data["Adj Close"].pct_change()
y = X.shift(1)

# drop first two rows
X = X.iloc[2:, :]
y = y.iloc[2:, :]

for symbol in symbols:

    X_sym = X[symbol].values.reshape(-1, 1)
    y_sym = y[symbol]
    model = LinearRegression().fit(X_sym, y_sym)

    print(f"R^2: {model.score(X_sym,y_sym)}")
    print(f"intercept: {model.intercept_}")
    print(f"slope: {model.coef_}")

Output:

[*********************100%***********************]  3 of 3 completed
R^2: 1.291986384543975e-08
intercept: 0.0016325991063324728
slope: [0.00011383]
R^2: 0.0015475148976311637
intercept: 0.00237810444818736
slope: [0.03956798]
R^2: 0.001977249242221757
intercept: 0.0017093384700974568
slope: [0.04447145]
  • Related