Home > other >  Generator batch size and batch size as parameter of model.fit()
Generator batch size and batch size as parameter of model.fit()

Time:04-15

Hi I have a question about the difference between my batch size set in my generate_train_data function and also the batch size set as a fit() parameter. If I want to train my data using batch size of 32, and I already set default in my generator as below, do I need to set again during my model.fit() function?

How I import the datasets:

BATCH_SIZE = 32
IMG_SIZE = (224, 224)

train_dataset = image_dataset_from_directory(data_dir,
                                              shuffle=True,
                                              label_mode = 'categorical',
                                              validation_split = 0.2,
                                              batch_size=BATCH_SIZE,
                                              seed = 42,
                                              subset = "training",
                                              image_size=IMG_SIZE
                                             )

validation_dataset = image_dataset_from_directory(data_dir,
                                              shuffle=True,
                                              label_mode = 'categorical',
                                              validation_split = 0.2,
                                              batch_size=BATCH_SIZE,
                                              seed = 42,
                                              subset = "validation",
                                              image_size=IMG_SIZE
                                             )

train_size = int(0.7 * 54305 / 32)
test_dataset = train_dataset.skip(train_size)
train_dataset = train_dataset.take(train_size)

How I define the generator function:

def generate_train_data(batch_size=32):
  x_batch = np.zeros((batch_size, 224, 224, 3))
  y_batch = np.zeros((batch_size,))
  c_batch = np.zeros((batch_size,))

  for image_batch, labels_batch in train_dataset:
      batch_size = len(image_batch)
      for i in range(0, batch_size):
        classes = decode_predictions(labels_batch)
        specie_position = specie_list.index(classes[i][0].split('___')[0])
        disease_position = disease_list.index(classes[i][0].split('___')[1])

        image = image_batch[i]
        x_batch[i] = image
        y_batch[i] = specie_position
        c_batch[i] = disease_position

        yield x_batch, [y_batch, c_batch]

Below is my model.fit() function

import itertools
train_gen = itertools.cycle(generate_train_data())
test_gen = itertools.cycle(generate_validation_data())

_ = model.fit(
    train_gen, 
    validation_data = test_gen,
    steps_per_epoch=steps_per_epoch,
    validation_steps=val_steps,
    epochs=100,
    # batch_size=32, ## Do I need to set this? 
    callbacks=keras_callbacks,
)

CodePudding user response:

You don't have to, as you can see it here: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

The batch_size arg is integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches).

  • Related