Hi I have a question about the difference between my batch size set in my generate_train_data function and also the batch size set as a fit() parameter. If I want to train my data using batch size of 32, and I already set default in my generator as below, do I need to set again during my model.fit() function?
How I import the datasets:
BATCH_SIZE = 32
IMG_SIZE = (224, 224)
train_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "training",
image_size=IMG_SIZE
)
validation_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "validation",
image_size=IMG_SIZE
)
train_size = int(0.7 * 54305 / 32)
test_dataset = train_dataset.skip(train_size)
train_dataset = train_dataset.take(train_size)
How I define the generator function:
def generate_train_data(batch_size=32):
x_batch = np.zeros((batch_size, 224, 224, 3))
y_batch = np.zeros((batch_size,))
c_batch = np.zeros((batch_size,))
for image_batch, labels_batch in train_dataset:
batch_size = len(image_batch)
for i in range(0, batch_size):
classes = decode_predictions(labels_batch)
specie_position = specie_list.index(classes[i][0].split('___')[0])
disease_position = disease_list.index(classes[i][0].split('___')[1])
image = image_batch[i]
x_batch[i] = image
y_batch[i] = specie_position
c_batch[i] = disease_position
yield x_batch, [y_batch, c_batch]
Below is my model.fit() function
import itertools
train_gen = itertools.cycle(generate_train_data())
test_gen = itertools.cycle(generate_validation_data())
_ = model.fit(
train_gen,
validation_data = test_gen,
steps_per_epoch=steps_per_epoch,
validation_steps=val_steps,
epochs=100,
# batch_size=32, ## Do I need to set this?
callbacks=keras_callbacks,
)
CodePudding user response:
You don't have to, as you can see it here: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit
The batch_size arg is integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches).