Predict text binary classification with RNN and didn't get expected output-CodePudding

I'm doing Amazon review sentiment analysis with RNN and LSTM. df2['Texts'] are Amazon customer reviews, and df2['label'] are binary integer 0 or 1.

tokenizer = Tokenizer(num_words=5000, split=' ') 
tokenizer.fit_on_texts(df2['Text'].values)
encoded_docs = tokenizer.texts_to_sequences(df2['Text'].values)
X = pad_sequences(encoded_docs, maxlen = 1000)
X.shape  # (3872, 1000)

y = df2['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

This is my model:

model = tf.keras.Sequential()
model.add(Embedding(1000, 64, input_length = X.shape[1]))
model.add(LSTM(176, dropout=0.4, recurrent_dropout=0.4))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Dense(32, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
print(model.summary())

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

batch_size=128
history = model.fit(X_train, y_train, epochs=13, batch_size=batch_size, validation_data=(X_test, y_test))

The validation accuracy for the last epoch is around 0.86.

And then I tried to predict the result of a text:

def anal_sent(my_text, my_model, my_tokenizer):
  encoded_text = my_tokenizer.texts_to_sequences(my_text)
  X = pad_sequences(encoded_text, maxlen = 1000)
  return (my_model.predict(X))

ex_review = "I bought it for my son and he says he likes it."
print(anal_sent(ex_review, model, tokenizer)) # this tokenizer is what I used for training dataset.

But the output is an array like [[0.73], [0.68], ...] instead of 0 or 1.

Is there anything wrong? What's the correct way to make prediction?

CodePudding user response：

texts_to_sequences should receive a list of texts. Otherwise it will interpret each word as a sentence. Try this:

ex_review = ["I bought it for my son and he says he likes it."]