Two input layers for LSTM Neural Network?-CodePudding

I am now building a neural network, and I am facing the task of adding another input layer (since now I just needed one). In particular, this was the code previously:

###...
        if(self.net_embedding==0):
            l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
            emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events   1, input_length=self.win_size)(l_input)
            toBePassed=emb_input
        elif(self.net_embedding==1):
            self.getWord2VecEmbeddings(params['word2vec_size'])
            X_train=self.encodePrefixes(params['word2vec_size'],X_train)
            l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
            toBePassed=l_input

        l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(toBePassed)
        l1 = BatchNormalization()(l1)
#and so on with the rest of the layers...

The input of the model (X_train) was just an array of arrays (with size = self.win_size) of integers (e.g. [[0 1 2 3] [1 2 3 4]...] if self.win_size = 4), where the integers represent categorical elements.

As you can see, I also have two types of embeddings for this input:

Embedding layer
Word2Vec encoding

Now, I need to add another input to the net, which is as well an array of arrays (with size = self.win_size again) of integers (eg. [[0 123 334 2212][123 334 2212 4888]...], but this time I don't need to apply any embedding (I think) because the elements here are not categorical (they represent elapsed time in seconds).

I tried by simply changing the net to:

#...
        if(self.net_embedding==0):
            l_input = Input(shape=self.win_size, dtype='int32', name='input_act')
            emb_input = Embedding(output_dim=params["output_dim_embedding"], input_dim=unique_events   1, input_length=self.win_size)(l_input)
            toBePassed=emb_input
        elif(self.net_embedding==1):
            self.getWord2VecEmbeddings(params['word2vec_size'])
            X_train=self.encodePrefixes(params['word2vec_size'],X_train)
            l_input = Input(shape = (self.win_size, params['word2vec_size']), name = 'input_act')
            toBePassed=l_input

        elapsed_time_input = Input(shape=self.win_size, name='input_time')
        input_concat = Concatenate(axis=1)([toBePassed, elapsed_time_input])

        l1 = LSTM(params["shared_lstm_size"],return_sequences=True, kernel_initializer='glorot_uniform',dropout=params['dropout'])(input_concat)
        l1 = BatchNormalization()(l1)
#and so on with other layers...

but I get the error:

ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 4, 12), (None, 4)]

Do you please have any solution for this? Any kind of help would be really appreciated, since I have a deadline in a few days and I'm smashing my head on this for so long now! Thanks :)

CodePudding user response：

There are two problems with your approach.

First, inputs to LSTM should have a shape of (batch_size, num_steps, num_feats), yet your elapsed_time_input has shape (None, 4). You need to expand its dimension to get the proper shape (None, 4, 1).

elapsed_time_input = tf.keras.layers.Reshape((-1, 1))(elapsed_time_input)
or
elapsed_time_input = tf.expand_dims(elapsed_time_input, axis=-1)

With this, "elapsed time in seconds" will be seen as just another feature of a timestep.

Secondly, you'll want to concatenate the two inputs in the feature dimension (not the timestep dimension).

input_concat = Concatenate(axis=-1)([toBePassed, elapsed_time_input])
or
input_concat = Concatenate(axis=2)([toBePassed, elapsed_time_input])

After this, you'll get a keras tensor with a shape of (None, 4, 13). It represents a batch of time series, each having 4 timesteps and 13 features per step (12 original features elapsed time in second for each step).