LIME text explainer for model with preprocessed input-CodePudding

I'm trying to explain a Keras LSTM model using LIME text explainer. I have news titles and a binary target variable (the sentiment).
My model is the following:

vocab_size = len(tokenizer.word_index)   1
embedding_dim = 16
max_length = 3000
trunc_type='post'
padding_type='post'
oov_tok = "<OOV>"

training_sequences = tokenizer.texts_to_sequences(X_titles_tr) # train texts
training_padded = pad_sequences(training_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)

testing_sequences = tokenizer.texts_to_sequences(X_titles_te) # tests texts
testing_padded = pad_sequences(testing_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)

num_epochs = 4
model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
        tf.keras.layers.MaxPooling1D(pool_size=2),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
        tf.keras.layers.Dense(16, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid'),
    ])
model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['AUC'])
model.fit(training_padded, y_train, epochs=num_epochs, validation_data=(testing_padded, y_test), verbose=1)

I want to use a LimeTextExplainer in the following manner:

explainer = LimeTextExplainer(class_names=["bad", "good"])
exp = explainer.explain_instance("Text to explain", model.predict, num_features=7)

However, my model inputs a padded sequence ( not a string). So, instead of model.predict I have tried to implement a custom predict function which, firstly, preprocesses the input and then makes a prediction:

def my_predict_function(x):
    testing_sequences = tokenizer.texts_to_sequences(x)
    testing_padded = pad_sequences(testing_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)
    return model.predict(testing_padded)

Still, my problem is not solved and I encounter the following error: IndexError: index 1 is out of bounds for axis 1 with size 1

CodePudding user response：

I think that you cannot pass a single input to your model, even when you use the predict method of your keras model.

Instead, you need to give it a list of inputs. So when you have a single input, you need to add [ ] in your code.

Maybe try this code as your prediction function :

def my_predict_function(x):
    testing_sequences = tokenizer.texts_to_sequences(x)
    testing_padded = pad_sequences(testing_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)
    return model.predict([testing_padded])

CodePudding user response：

From lime documentation,

classifier_fn – classifier prediction probability function, which takes a list of d strings and outputs a (d, k) numpy array with prediction probabilities, where k is the number of classes.

You have two classes, but your predictions are squashed on one axis. i.e. if you have 10 predictions, you have a [10, 1] sized tensor. You need to convert this to [10, 2]. In other words, if you got predictions [0.2, 0.8, 0.9], you need to change it so that, we have two columns [[0.8, 0.2], [0.2, 0.8], [0.1,0.9]]. (Assuming bad -> 0 and good -> 1).

from lime.lime_text import LimeTextExplainer

explainer = LimeTextExplainer(class_names=["bad", "good"])

def my_predict_function(x):

    testing_sequences = tokenizer.texts_to_sequences(x)
    testing_padded = tf.keras.preprocessing.sequence.pad_sequences(testing_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)
    pred = model.predict(testing_padded)

    format_pred = np.concatenate([1.0-pred, pred], axis=1)

    return format_pred

exp = explainer.explain_instance("movie is bad", my_predict_function, num_features=7)

print(exp.as_list())