tensorflow: Input 0 of layer "lstm_1" is incompatible with the layer: expected ndim=3, fou-CodePudding

I have a seq2seq model built as so:

latent_dim = 256
epochs = 20
batch_size = 64
encoder_inputs = Input(shape=(None,))
x = Embedding(num_encoder_tokens, latent_dim,input_length=max_english_sentence_length)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim,return_state=True)(x)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim,input_length=max_toki_sentence_length)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=["accuracy"])
model.summary()

model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
      batch_size=batch_size,
      epochs=epochs,
      validation_split=0.2)

encoder_input_data has shape (2000, 57, 7265) and contains 2000 sentences with at most 57 words, with one-hot encoded tokens.

decoder_input_data and decoder_target_data have shape (2000, 87, 987) and contains 2000 sentences with at most 87 words, with one-hot encoded tokens. Decoder_target_data is offset by one timestep from decoder_input_data.

As far as i'm aware, the data is formatted correctly but when running model.fit i get:

Input 0 of layer "lstm" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (64, 57, 7265, 256)

What am i doing wrong here?

CodePudding user response：

The issue come from the Embedding Layer. To me, you can not use one hot encoding with the keras "Encobedding" layer. Indeed, as said in the docs :

Input shape

2D tensor with shape: (batch_size, input_length).

Output shape

3D tensor with shape: (batch_size, input_length, output_dim).

Note that it takes a 2D array and outputs a 3D array. This is why your inputs went from a 3D array to a 4D array (after the embedding). And this is not good for your LSTM, as it can only receive 3D array.

You should convert your one hot encoding to a basic numbering. So instead of a [0,0,1,0,0] input, you need just have a single value : 2.

This way your inputs will be a 2D array : (batch_size, input_length) and will be converted to a nice 3D array after the embedding layer : (batch_size, input_length, output_dim). And your LSTM layer will be happy ;)