Home > Enterprise >  Using SHAP with custom sklearn estimator
Using SHAP with custom sklearn estimator

Time:04-10

Using the following Custom estimator that utilizes sklearn pipeline if it's provided and does target transformation if needed:

class model_linear_regression(base_model_class):
    def __init__(self, pipe=None, inverse=False):
        self.name = 'Linear_Regression'
        self.model = LinearRegression()
        
        if pipe==None:
            self.pipe = Pipeline([('model', self.model)])
        else:
            self.pipe = deepcopy(pipe)
            self.pipe.steps.append(('model', self.model))

        if inverse:
            self.pipe = TransformedTargetRegressor( regressor=self.pipe,
                                                    func=np.log1p, 
                                                    inverse_func=np.expm1)
    def fit(self, X:pd.DataFrame=X_train, y:pd.Series=y_train):
        self.pipe.fit(X, y)
        return self
    def predict(self, X:pd.DataFrame=X_test):
        y_pred = self.pipe.predict(X)
        return y_pred

Using it along with SHAP returns a following error:

Typeerror: ufunc 'isfinite' not supported for the input types, and the inputs could not be 
safely coerced to any supported types according to the casting rule ''safe''

NOTE:

  • the pipeline provides np.ndarray to the estimator and not a pd.DataFrame

EXAMPLE:

def get_shap(model, X, y):
    train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=.3, random_state=42)
    model.fit(train_X, train_y)
    explainer = shap.Explainer(model.predict, test_X)
    shap_values = explainer(test_X)
    return shap_values

results = get_shap(model_linear_regression(pipe=LINEAR_PIPE, inverse=True), X, y)

How to get it to work?

CodePudding user response:

Okay... got a working solution:

# Select model
shap_model = model_linear_regression(pipe=LINEAR_PIPE, inverse=True)
# fit model
model_fitted = shap_model.fit(X_train, y_train)

# Summarize model (not necessary but makes things faster
# If not being summarized replace all X_test_summary with X_test
X_test_summary = shap.sample(X_test, 10)

# Explain in iteration n=K in summary
explainer = shap.KernelExplainer(model_fitted.predict, X_test_summary, keep_index=True)
shap_values = explainer.shap_values(X_test_summary)

shap.summary_plot(shap_values, X_test_summary)

Here are also some errors I got and found solutions to:

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

AttributeError: 'Kernel' object has no attribute 'masker'

NEW ISSUES:

  • Now the problem is that not all of plots are available when result is np.ndarray so need to find ways how to fix that.
  • Related