Obviously, I know that adding in validation data would make training take longer but the time difference I am talking here is absurd. Code:
# Training
def training(self, callback_bool):
if callback_bool:
callback_list = []
else:
callback_list = []
self.history = self.model.fit(self.x_train, self.y_train, validation_data=(self.x_test, self.y_test),
batch_size=1, steps_per_epoch=10, epochs=100)
The code above takes me more than 30 minutes to train even though the size of my test data is 10,000 data points. The size of my train data is 40,000 data points and when I train without validation data, I am done within seconds. Is there a way to remedy this? Why does it take this long? To boot, I am training on a gpu as well!
CodePudding user response:
I assume validation works as intended, and you have a problem in the training process itself. You are using batch_size = 1 and steps_per_epoch = 10, which means the model will see only 10 data points during every epoch. That's why it takes only few seconds. On the other hand, you don't use the validation_steps argument, which means the validation after every epoch will run until your validation dataset is exshausted, i.e. for 10.000 steps. Hence the difference in times. You can read more about model.fit and its arguments in the official documentation.
If your training dataset isn't infinite, I suggest you to remove the steps_per_epoch argument. If it is, pass it the value of len(x_train)//batch_size instead. That way the model will be fed with every single training data point for each epoch. I assume every epoch will take ~1.5 hours instead of seconds you currently have. Also I suggest to increase the batch_size, if there is no specific reason to use batch size of 1.
Edited: typos