Home > OS >  batch size in input_shape statement for Keras Conv1D layers
batch size in input_shape statement for Keras Conv1D layers

Time:06-08

My data (after reshaping):

  • X_train = numpy_array, shape: (21000, 2297, 1)
  • X_val = numpy_array, shape: (9000, 2297, 1)

Both arrays contain time series. All time series have length 2297 due to padding.


My model:

model = keras.Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape=(2297, 1))) # input_shape = (n_columns, 1)
model.add(Dropout(0.2))
model.add(Conv1D(256, 2, activation='relu'))
model.add(Dropout(0.2))
model.add(Conv1D(32, 2, activation='relu'))
model.add(Flatten())
model.add(Dense(1))

model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'mse', metrics = ['mae', 'mse']) 
model.summary()
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val), verbose=1)

My problem:

If I leave the input_shape() statement as above, the model runs fine but takes very long to be trained. I guess, this is because not passing a batch size makes the model use full batch. Is that right?

I want to pass a batch size in order to make the network train on just a fraction of my data in each step. According to this post and this post, the correct order of entries for my data is:

input_shape=(batch size, time steps, 1)

But, say my desired batch size = 1000. Then, my first Conv1D layer looks as follows (the rest of the model remains as posted above):

model = keras.Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape=(1000, 2297, 1))) # input_shape = (batch_size, n_columns, 1) ...

and raises the following error:

ValueError: Input 0 of layer conv1d_3 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 1000, 2297, 1]

Why is that and how can I pass the batch size correctly?

CodePudding user response:

In the first version of written code, you write correctly.

how can I pass the batch size correctly? we don't need to pass batch_size as input_shape to our model. we can set batch_size in the model.fit(..., batch_size=1000).

If your model takes very long to be trained:

  1. Make sure to train your model on GPU.
  2. You can use a smaller filter_size. (you use filter=256 in the second Conv1d layer. I only use filter = 16 or 32)
  3. You can use larger strides. (In Conv1D, strides=1 as default. I use strides = 2)
  4. You can use a small kernel_size. (you use kernel_size=2 and it's OK.)

Full Code: (training time for 10 epochs -> 10 sec)

import tensorflow as tf
import numpy as np

X_train = np.random.rand(21000,2297,1)
y_train = np.random.randint(0,2,21000)

X_val = np.random.rand(9000,2297,1)
y_val = np.random.randint(0,2,9000)

model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv1D(filters = 16, 
                                 kernel_size = 2, 
                                 strides = 2,
                                 activation='relu', input_shape=(2297, 1)))

model.add(tf.keras.layers.Dropout(0.2))

model.add(tf.keras.layers.Conv1D(filters = 32, 
                                 kernel_size = 2, 
                                 strides = 2,
                                 activation='relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Conv1D(16, 2, activation='relu'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1))

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), 
              loss = 'mse', metrics = ['mae', 'mse']) 
model.summary()

history = model.fit(X_train, y_train, 
                    batch_size=128, 
                    epochs=10, 
                    validation_data=(X_val, y_val), 
                    verbose=1)

Output:

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv1d_21 (Conv1D)          (None, 1148, 16)          48        
                                                                 
 dropout_12 (Dropout)        (None, 1148, 16)          0         
                                                                 
 conv1d_22 (Conv1D)          (None, 574, 32)           1056      
                                                                 
 dropout_13 (Dropout)        (None, 574, 32)           0         
                                                                 
 conv1d_23 (Conv1D)          (None, 573, 16)           1040      
                                                                 
 flatten_6 (Flatten)         (None, 9168)              0         
                                                                 
 dense_6 (Dense)             (None, 1)                 9169      
                                                                 
=================================================================
Total params: 11,313
Trainable params: 11,313
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
165/165 [==============================] - 3s 13ms/step - loss: 0.2608 - mae: 0.5008 - mse: 0.2608 - val_loss: 0.2747 - val_mae: 0.4993 - val_mse: 0.2747
Epoch 2/10
165/165 [==============================] - 2s 13ms/step - loss: 0.2521 - mae: 0.4991 - mse: 0.2521 - val_loss: 0.2865 - val_mae: 0.4993 - val_mse: 0.2865
Epoch 3/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2499 - mae: 0.4969 - mse: 0.2499 - val_loss: 0.2988 - val_mae: 0.4991 - val_mse: 0.2988
Epoch 4/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2484 - mae: 0.4952 - mse: 0.2484 - val_loss: 0.2850 - val_mae: 0.4993 - val_mse: 0.2850
Epoch 5/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2481 - mae: 0.4926 - mse: 0.2481 - val_loss: 0.2650 - val_mae: 0.5001 - val_mse: 0.2650
Epoch 6/10
165/165 [==============================] - 2s 9ms/step - loss: 0.2457 - mae: 0.4899 - mse: 0.2457 - val_loss: 0.2824 - val_mae: 0.4998 - val_mse: 0.2824
Epoch 7/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2432 - mae: 0.4856 - mse: 0.2432 - val_loss: 0.2591 - val_mae: 0.5005 - val_mse: 0.2591
Epoch 8/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2426 - mae: 0.4824 - mse: 0.2426 - val_loss: 0.2649 - val_mae: 0.5009 - val_mse: 0.2649
Epoch 9/10
165/165 [==============================] - 2s 10ms/step - loss: 0.2392 - mae: 0.4781 - mse: 0.2392 - val_loss: 0.2693 - val_mae: 0.5009 - val_mse: 0.2693
Epoch 10/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2366 - mae: 0.4733 - mse: 0.2366 - val_loss: 0.2688 - val_mae: 0.5012 - val_mse: 0.2688

Note:

  1. I use random number as input.
  2. Samples = 21000, batch_size=128 -> training_sample for each epoch = 21000/128 = 164.06 ~= 165
  • Related