My data (after reshaping):
X_train
= numpy_array, shape: (21000, 2297, 1)X_val
= numpy_array, shape: (9000, 2297, 1)
Both arrays contain time series. All time series have length 2297 due to padding.
My model:
model = keras.Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape=(2297, 1))) # input_shape = (n_columns, 1)
model.add(Dropout(0.2))
model.add(Conv1D(256, 2, activation='relu'))
model.add(Dropout(0.2))
model.add(Conv1D(32, 2, activation='relu'))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'mse', metrics = ['mae', 'mse'])
model.summary()
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val), verbose=1)
My problem:
If I leave the input_shape()
statement as above, the model runs fine but takes very long to be trained. I guess, this is because not passing a batch size makes the model use full batch. Is that right?
I want to pass a batch size in order to make the network train on just a fraction of my data in each step. According to this post and this post, the correct order of entries for my data is:
input_shape=(batch size, time steps, 1)
But, say my desired batch size = 1000. Then, my first Conv1D layer looks as follows (the rest of the model remains as posted above):
model = keras.Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape=(1000, 2297, 1))) # input_shape = (batch_size, n_columns, 1) ...
and raises the following error:
ValueError: Input 0 of layer conv1d_3 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 1000, 2297, 1]
Why is that and how can I pass the batch size correctly?
CodePudding user response:
In the first version of written code, you write correctly.
how can I pass the batch size correctly?
we don't need to pass batch_size
as input_shape
to our model. we can set batch_size
in the model.fit(..., batch_size=1000)
.
If your model takes very long to be trained:
- Make sure to train your model on
GPU
. - You can use a smaller filter_size. (you use
filter=256
in the second Conv1d layer. I only use filter = 16 or 32) - You can use larger strides. (In Conv1D,
strides=1
as default. I use strides = 2) - You can use a small kernel_size. (you use
kernel_size=2
and it's OK.)
Full Code: (training time for 10 epochs -> 10 sec)
import tensorflow as tf
import numpy as np
X_train = np.random.rand(21000,2297,1)
y_train = np.random.randint(0,2,21000)
X_val = np.random.rand(9000,2297,1)
y_val = np.random.randint(0,2,9000)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv1D(filters = 16,
kernel_size = 2,
strides = 2,
activation='relu', input_shape=(2297, 1)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Conv1D(filters = 32,
kernel_size = 2,
strides = 2,
activation='relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Conv1D(16, 2, activation='relu'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss = 'mse', metrics = ['mae', 'mse'])
model.summary()
history = model.fit(X_train, y_train,
batch_size=128,
epochs=10,
validation_data=(X_val, y_val),
verbose=1)
Output:
Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_21 (Conv1D) (None, 1148, 16) 48
dropout_12 (Dropout) (None, 1148, 16) 0
conv1d_22 (Conv1D) (None, 574, 32) 1056
dropout_13 (Dropout) (None, 574, 32) 0
conv1d_23 (Conv1D) (None, 573, 16) 1040
flatten_6 (Flatten) (None, 9168) 0
dense_6 (Dense) (None, 1) 9169
=================================================================
Total params: 11,313
Trainable params: 11,313
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
165/165 [==============================] - 3s 13ms/step - loss: 0.2608 - mae: 0.5008 - mse: 0.2608 - val_loss: 0.2747 - val_mae: 0.4993 - val_mse: 0.2747
Epoch 2/10
165/165 [==============================] - 2s 13ms/step - loss: 0.2521 - mae: 0.4991 - mse: 0.2521 - val_loss: 0.2865 - val_mae: 0.4993 - val_mse: 0.2865
Epoch 3/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2499 - mae: 0.4969 - mse: 0.2499 - val_loss: 0.2988 - val_mae: 0.4991 - val_mse: 0.2988
Epoch 4/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2484 - mae: 0.4952 - mse: 0.2484 - val_loss: 0.2850 - val_mae: 0.4993 - val_mse: 0.2850
Epoch 5/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2481 - mae: 0.4926 - mse: 0.2481 - val_loss: 0.2650 - val_mae: 0.5001 - val_mse: 0.2650
Epoch 6/10
165/165 [==============================] - 2s 9ms/step - loss: 0.2457 - mae: 0.4899 - mse: 0.2457 - val_loss: 0.2824 - val_mae: 0.4998 - val_mse: 0.2824
Epoch 7/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2432 - mae: 0.4856 - mse: 0.2432 - val_loss: 0.2591 - val_mae: 0.5005 - val_mse: 0.2591
Epoch 8/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2426 - mae: 0.4824 - mse: 0.2426 - val_loss: 0.2649 - val_mae: 0.5009 - val_mse: 0.2649
Epoch 9/10
165/165 [==============================] - 2s 10ms/step - loss: 0.2392 - mae: 0.4781 - mse: 0.2392 - val_loss: 0.2693 - val_mae: 0.5009 - val_mse: 0.2693
Epoch 10/10
165/165 [==============================] - 1s 9ms/step - loss: 0.2366 - mae: 0.4733 - mse: 0.2366 - val_loss: 0.2688 - val_mae: 0.5012 - val_mse: 0.2688
Note:
- I use random number as input.
Samples = 21000
,batch_size=128
-> training_sample for each epoch =21000/128 = 164.06 ~= 165