I have followed some kind of training and the final Jupyter notebook was this:
https://colab.research.google.com/drive/1Lmh1b5Ge9NodxIrukCTJC3cpYQDn9VuM?usp=sharing
I understand the entire code, and how the model was trained.
However, at the end I am predicting emotions for tweets in the test dataset like this:
i = random.randint(0, len(test_labels)-1)
print('Sentence:', test_tweets[i])
print('Emotion:', index_to_class[test_labels[i]])
p = model.predict(np.expand_dims(test_seq[i], axis=0))[0]
pred_class = index_to_class[np.argmax(p).astype('uint8')]
print('Predicted Emotion:', pred_class)
This works perfectly fine.
However I want to test the model prediction with random sentences, like:
sentence = 'I love you more than ever'
print('Sentence:', sentence)
#print('Emotion:', index_to_class[test_labels[i]])
p = model.predict(np.expand_dims(sentence, axis=0))[0]
pred_class = index_to_class[np.argmax(p).astype('uint8')]
print('Predicted Emotion:', pred_class)
But I got this error:
Sentence: I love you more than ever
WARNING:tensorflow:Model was constructed with shape (None, 50) for input KerasTensor(type_spec=TensorSpec(shape=(None, 50), dtype=tf.float32, name='embedding_input'), name='embedding_input', description="created by layer 'embedding_input'"), but it was called on an input with incompatible shape (None,).
What am I missing here?
CodePudding user response:
Your model needs an integer sequence, not a raw string. Try converting the sentence to its corresponding integer sequence first:
sentence = 'I love you more than ever'
print('Sentence:', sentence)
#print('Emotion:', index_to_class[test_labels[i]])
sentence = get_sequences(tokenizer, np.expand_dims(sentence, axis=0))
p = model.predict(sentence)[0]
pred_class = index_to_class[np.argmax(p).astype('uint8')]
print('Predicted Emotion:', pred_class)
Sentence: I love you more than ever
Predicted Emotion: joy
CodePudding user response:
Just to add a little:
Shape
np.expand_dims(sentence).shape
is(1,)
, not(None, 50)
.- it should be expanded one more dimension for
batch
size.
Sequences
Input
of your model is a padded sequence of numbers, transformed by a tokenizer.- it should be 50 in length.