Using the following Custom estimator that utilizes sklearn pipeline
if it's provided and does target transformation if needed:
class model_linear_regression(base_model_class):
def __init__(self, pipe=None, inverse=False):
self.name = 'Linear_Regression'
self.model = LinearRegression()
if pipe==None:
self.pipe = Pipeline([('model', self.model)])
else:
self.pipe = deepcopy(pipe)
self.pipe.steps.append(('model', self.model))
if inverse:
self.pipe = TransformedTargetRegressor( regressor=self.pipe,
func=np.log1p,
inverse_func=np.expm1)
def fit(self, X:pd.DataFrame=X_train, y:pd.Series=y_train):
self.pipe.fit(X, y)
return self
def predict(self, X:pd.DataFrame=X_test):
y_pred = self.pipe.predict(X)
return y_pred
Using it along with SHAP returns a following error:
Typeerror: ufunc 'isfinite' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''
NOTE:
- the pipeline provides
np.ndarray
to the estimator and not apd.DataFrame
EXAMPLE:
def get_shap(model, X, y):
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=.3, random_state=42)
model.fit(train_X, train_y)
explainer = shap.Explainer(model.predict, test_X)
shap_values = explainer(test_X)
return shap_values
results = get_shap(model_linear_regression(pipe=LINEAR_PIPE, inverse=True), X, y)
How to get it to work?
CodePudding user response:
Okay... got a working solution:
# Select model
shap_model = model_linear_regression(pipe=LINEAR_PIPE, inverse=True)
# fit model
model_fitted = shap_model.fit(X_train, y_train)
# Summarize model (not necessary but makes things faster
# If not being summarized replace all X_test_summary with X_test
X_test_summary = shap.sample(X_test, 10)
# Explain in iteration n=K in summary
explainer = shap.KernelExplainer(model_fitted.predict, X_test_summary, keep_index=True)
shap_values = explainer.shap_values(X_test_summary)
shap.summary_plot(shap_values, X_test_summary)
Here are also some errors I got and found solutions to:
IndexError: only integers, slices (
:), ellipsis (
...), numpy.newaxis (
None) and integer or boolean arrays are valid indices
AttributeError: 'Kernel' object has no attribute 'masker'
NEW ISSUES:
- Now the problem is that not all of plots are available when result is
np.ndarray
so need to find ways how to fix that.