The tensorflow model
uses the following code for training:
model.fit(train_dataset,
steps_per_epoch=10000,
validation_data=test_dataset,
epochs=20000
)
The total steps_per_epoch
is 10000
and epochs
is 20000
.
Is it possible to split the training time for multiple days:
day 1:
model.fit(..., steps_per_epoch=10000, ..., epochs=10, ....)
model.fit(..., steps_per_epoch=10000, ..., epochs=20, ....)
model.fit(..., steps_per_epoch=10000, ..., epochs=30, ....)
day 2:
model.fit(..., steps_per_epoch=10000, ..., epochs=100, ....)
day 3:
model.fit(..., steps_per_epoch=10000, ..., epochs=5, ....)
day (n):
model.fit(..., steps_per_epoch=10000, ..., epochs=n, ....)
The expected epochs
is:
20000 = (day1 day2 day3 ... dayn)
Can I simply stop the model.fit
and start the model.fit
on another day?
Is it the same as running once with "epochs=20000
"?
CodePudding user response:
You can save your model after each day as a pickle file
then tomorrow load your model and continue training:
training the model in day_1
import tensorflow_datasets as tfds
import tensorflow as tf
import joblib
train, test = tfds.load(
'fashion_mnist',
shuffle_files=True,
as_supervised=True,
split = ['train', 'test']
)
train = train.repeat(15).batch(64).prefetch(tf.data.AUTOTUNE)
test = test.batch(64).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(28, 28, 1)))
model.add(tf.keras.layers.Conv2D(128, (3,3), activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(train, batch_size=256, steps_per_epoch=150, epochs=3, verbose=1)
model.evaluate(test, verbose=1)
joblib.dump(model, 'model_day_1.pkl')
Output after day_1:
Epoch 1/3
150/150 [==============================] - 7s 17ms/step - loss: 23.0504 - accuracy: 0.5786
Epoch 2/3
150/150 [==============================] - 2s 16ms/step - loss: 0.9366 - accuracy: 0.7208
Epoch 3/3
150/150 [==============================] - 3s 17ms/step - loss: 0.7321 - accuracy: 0.7682
157/157 [==============================] - 1s 8ms/step - loss: 0.4627 - accuracy: 0.8405
INFO:tensorflow:Assets written to: ram://***/assets
INFO:tensorflow:Assets written to: ram://***/assets
['model_day_1.pkl']
Load model in day_2 and continue training:
model = joblib.load("/content/model_day_1.pkl")
model.fit(train, batch_size=256, steps_per_epoch=150, epochs=3, verbose=1)
model.evaluate(test, verbose=1)
joblib.dump(model, 'model_day_2.pkl')
Output after day_2:
Epoch 1/3
150/150 [==============================] - 3s 17ms/step - loss: 0.6288 - accuracy: 0.7981
Epoch 2/3
150/150 [==============================] - 2s 16ms/step - loss: 0.5290 - accuracy: 0.8222
Epoch 3/3
150/150 [==============================] - 2s 16ms/step - loss: 0.5124 - accuracy: 0.8272
157/157 [==============================] - 1s 5ms/step - loss: 0.4131 - accuracy: 0.8598
INFO:tensorflow:Assets written to: ram://***/assets
INFO:tensorflow:Assets written to: ram://***/assets
['model_day_2.pkl']
Load model in day_3 and continue training:
model = joblib.load("/content/model_day_2.pkl")
model.fit(train, batch_size=256, steps_per_epoch=150, epochs=3, verbose=1)
model.evaluate(test, verbose=1)
joblib.dump(model, 'model_day_3.pkl')
Output after day_3:
Epoch 1/3
150/150 [==============================] - 3s 17ms/step - loss: 0.4579 - accuracy: 0.8498
Epoch 2/3
150/150 [==============================] - 2s 17ms/step - loss: 0.4078 - accuracy: 0.8589
Epoch 3/3
150/150 [==============================] - 2s 16ms/step - loss: 0.4073 - accuracy: 0.8560
157/157 [==============================] - 1s 5ms/step - loss: 0.3997 - accuracy: 0.8603
INFO:tensorflow:Assets written to: ram://***/assets
INFO:tensorflow:Assets written to: ram://***/assets
['model_day_3.pkl']
CodePudding user response:
I think you're asking if multiple calls to model.fit
will continue training the model (instead of starting from scratch)--the answer is yes, it will. However, a new History
object is generated for each model.fit
call, so if you are capturing that, you may want to handle that separately.
So running
model.fit(..., epochs=10)
model.fit(..., epochs=10)
will train the model for 20 epochs in total.