Home > Software engineering >  How to find coefficients for LinearRegression problem with Pipeline and GridSearchCV
How to find coefficients for LinearRegression problem with Pipeline and GridSearchCV

Time:10-23

I'm performing a LinearRegression model with a pipeline and GridSearchCV, I can not manage to make it to the coefficients that are calculated for each feature of X_train.

mlr_gridsearchcv = Pipeline(steps =[('preprocessor', preprocessor),
('gridsearchcv_lr', GridSearchCV(TransformedTargetRegressor(regressor= LinearRegression(), 
func = np.log,inverse_func = np.exp), param_grid=parameter_lr, cv = nfolds, 
scoring = ('r2','neg_mean_absolute_error'), return_train_score = True, 
refit='neg_mean_absolute_error', n_jobs = -1))])

mlr_co2=mlr_gridsearchcv.fit(X_train,Y_train['co2e'])

I've tried to get best_estimator_ first:

mlr_co2.named_steps['gridsearchcv_lr'].cv_results_.best_estimator_

and I get:

AttributeError: 'dict' object has no attribute 'best_estimator_'

If I try this way:

mlr_co2.named_steps['gridsearchcv_lr'].best_estimator_.regressor.coef_

I get:

AttributeError: 'LinearRegression' object has no attribute 'coef_'

I tried other combinations but nothing seems to work.

CodePudding user response:

You can use:

results['gridsearchcv'].best_estimator_.regressor_.coef_

where results is the fitted pipeline and 'gridsearchcv' is the name of the grid search step in the pipeline, see the code below.

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.compose import TransformedTargetRegressor
np.random.seed(42)

# generate the data
X = np.random.lognormal(0, 1, (100, 3))
y = np.mean(X, axis=1)   np.random.normal(0, 0.1, 100)

# define the pipeline
preprocessor = MinMaxScaler(feature_range=(0, 1))

estimator = TransformedTargetRegressor(
    regressor=LinearRegression(),
    func=np.log,
    inverse_func=np.exp
)

gridsearchcv = GridSearchCV(
    estimator,
    param_grid={'regressor__fit_intercept': [True, False]},
    cv=5,
    scoring=('r2', 'neg_mean_absolute_error'),
    return_train_score=True,
    refit='neg_mean_absolute_error',
    n_jobs=-1
)

pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('gridsearchcv', gridsearchcv)
])

# fit the pipeline
results = pipeline.fit(X, y)

# extract the estimated coefficients of the best model
results['gridsearchcv'].best_estimator_.regressor_.coef_
# [0.89791824 1.11311974 2.99750775]
  • Related