In text classification, how to find the part of sentence that is important for the classification?-CodePudding

I have trained a text classification model that works well. I wanted to get deeper and understand what words/phrases from a sentence were most impactful in the classification outcome. I want to understand what words are most important for each classification outcome

I am using Keras for the classification and below is the code I am using to train the model. It's a simple embedding plus max-pooling text classification model that I am using.

from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
import tensorflow as tf 
from tensorflow.keras.callbacks import EarlyStopping 

# early stopping
callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, 
patience=5, verbose=2,  mode='auto', restore_best_weights=True)

# select optimizer
opt = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-07, amsgrad=False, name="Adam")
embedding_dim = 50

# declare model
model = Sequential()
model.add(layers.Embedding(input_dim=vocab_size, 
                           output_dim=embedding_dim, 
                           input_length=maxlen))
model.add(layers.GlobalMaxPool1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer=opt,
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.summary()

# fit model
history = model.fit(X_tr, y_tr,
                    epochs=20,
                    verbose=True,
                    validation_data=(X_te, y_te),
                    batch_size=10, callbacks=[callbacks])
loss, accuracy = model.evaluate(X_tr, y_tr, verbose=False)

How do I extract the phrases/words that have the maximum impact on the classification outcome?

CodePudding user response：

It seems that the keyword you need are "neural network interpretability" and "feature attribution". One of the best known methods in this area is called Integrated Gradients; it shows how model prediction depend on each input feature (each word embedding, in your case).

This tutorial shows how to implement IG in pure tensorflow for images, and this one uses the alibi library to highlight the words in the input text with the highest impact on a classification model.