Does augmentation in keras modify the dataset at each epoch or it is performed only once?-CodePudding

I have a question regarding the image dataset function tf.keras.utils.image_dataset_from_directory and augmentation: If we set a seed in this dataset creation function a provide some augmentations layers in the model (those layers also with a seed specified), will there be new augmented images at each epoch or will it always train on the same set of augmented images.

Example:

# model construction
model_input = tf.keras.Input(shape=(96, 96, 3))
add_layers=tf.keras.layers.RandomFlip(mode='horizontal',seed=75398)(model_input) # augmentation
add_layers=tf.keras.layers.RandomRotation(factor=0.1,seed=2143)(model_input) # augmentation
add_layers= #some nn layers# # model core
output = tf.keras.layers.Dense(1)(add_layers)
model = tf.keras.Model(inputs=model_input, outputs=output)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
                  loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                  metrics=['accuracy'])
# datasets construction
dataset = tf.keras.utils.image_dataset_from_directory(dataset_directory,
                                                          labels=[1 0 1 0 0 0 1 1 1 ...], label_mode='int',  
                                                          image_size=(96, 96), batch_size=32,
                                                          shuffle=True,
                                                          seed=2342)
dataset_length = [i for i, _ in enumerate(dataset)][-1]   1
dataset = dataset.cache().shuffle(1000).prefetch(buffer_size=tf.data.AUTOTUNE)
validation_split = 0.1
train_set = dataset.skip(int(dataset_length * validation_split))
validation_set = dataset.take(int(dataset_length * validation_split))

# training
model.fit(train_set, validation_data=validation_set, epochs=5)

In this example will the augmented images be the same during the 5 epochs or will each epoch create a new set of augmented images? If this will use always the same images, how can I increase the number of images I train with with augmentation?

CodePudding user response：

The images are augmented right when the image_dataset_from_directory method is executed. Hence, no, it will not create new images for each epoch. Also, for increasing the number of images with augmentation, try reading this blog:

https://machinelearningmastery.com/image-augmentation-with-keras-preprocessing-layers-and-tf-image/

CodePudding user response：

If you go into the source code of the template class BaseImageAugmentationLayer from which RandomFlip and RandomRotation inherit, you will see that augmentations are performed randomly each and everytime the layer is called with the training argument set to true.

Take RandomFlip for example. Everytime it's called with training=True, it will fire up two calls to np.random.choice([True,False]) to decide whether the input image will be flipped horizontally and vertically, respectively. If the input is a batch containing multiple images, then this is done on a per-image basis.

In short, augmentation layers do not care whether this is a new epoch or not. Their only one job is to augment the inputs randomly each and everytime they are called. The seed argument sets the initial state for the layer's random generator, but the state of the random generator is not reset after each epoch.