Keras bug when adding preprocessing layer to sequential model-CodePudding

I created a Sequential preprocessing layer model like so:

import tensorflow.keras as keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dropout, RandomRotation
from tensorflow.keras.utils import set_random_seed; set_random_seed(72)
import matplotlib.pyplot as plt

(ax, ay), (qx, qy) = cifar10.load_data()
ay = keras.utils.to_categorical(ay, 10)
qy = keras.utils.to_categorical(qy, 10)
ax = ax.astype('float32'); ax /= 255;
qx = qx.astype('float32'); qx /= 255;

DA = Sequential([RandomRotation(180/360,fill_mode="nearest",interpolation="bilinear", input_shape=(32, 32, 3))])

I then printed the output of the first image using:

X=ax[0:1,:,:,:]
plt.imshow(X[0])
plt.show()

transformedX=DA(X).numpy()
plt.imshow(transformedX[0,:,:,:])
plt.show()

Result:

This is the expected output. The network added a random rotation to the image.

Then, I added the preprocessing model to another sequential model including nothing but it and a Dropout layer.

model = Sequential()
model.add(DA)
model.add(Dropout(0.25))

Finally, I printed the images again in the same way as before without using the new model at all:

X=ax[0:1,:,:,:]
plt.imshow(X[0])
plt.show()

transformedX=DA(X).numpy()
plt.imshow(transformedX[0,:,:,:])
plt.show()

Result:

I got this result both locally (in Spyder) and using Google Colab. Here's the notebook if you want to try it out.

From here, every other time you run the program, every image will look like the original. To get this result again, I need to Restart Runtime (in Google Colab), %reset does not seem to work locally.

If I remove the input_shape=(32, 32, 3) line from the preprocessing layer, the problem does not occur. However, I was under the impression this was necessary to include in the first layer of a model.

Is this a real bug or a problem in my code?

If it is a bug, is this particular of some outdated version of Keras or Tensorflow?

CodePudding user response：

The reason for this is threefold. It is related to

How TF handle the training argument (or the lack thereof) passed to a layer call
How Dropout layer handle training=None
How TF constructs Sequential models

Note that my answer is based on TF v2.9.1.

The `training` argument

Some layers, such as Dropout or RandomRotation, behave differently during training and inferencing. That's why at their base, layers always tries to identify if the call is made during training or not whenever they are called via () (syntactic sugar for __call__). Internally, the training flag is set to, in priority order,

training argument with non-None value explicitly passed to the layer call e.g., when you call the layer as layer(inputs, training=True/False)

training argument determined by this very same 4-check procedure for its parent layer in a layer call chain.

learning_phase variable of the backend, only if that variable has been set. Checking the variable's state is done by keras.backend.global_learning_phase_is_set() and getting its value is done by keras.backend.learning_phase().

Default value of training in this layer call signature. Note that call ≠ __call__. The former is a TF-defined method and the latter is one of many built-in magic methods in Python, although __call__ implementation of base layer eventually invokes call at some point.

If none of the 4 checks yields a non-None value, then training=None is used.

RandomRotation layer only rotates images if it sees training=True. Your call to it fails the first three checks but pass the last thanks to training being defaulted to True in its call signature. Thus, the layer sees training=True and behaves as expected. However, as soon as you add Dropout, everything went south so what's happening?

Dropout and `training=None`

It turns out that a call to Dropout with eventual training=None can actually set the state (but not the value) of the learning_phase variable. This happens easily because unlike RandomRotation, Dropout has default training=None which provides no guard for check 4.

>>> keras.backend.global_learning_phase_is_set()
False
>>> _ = tf.keras.layers.Dropout(.25)([1,2,3])
>>> keras.backend.global_learning_phase_is_set()
True

Once that happens, check 4 is essentially ignored for all subsequent calls to any layer: They will use learning_phase (which defaults to 0) as training whenever reaching check 3 and stop there. Your later calls to RandomRotation fell victim to this, thinking that they are made during inference and thus return the input as-is.

More precisely, as Dropout won't accept None for training, it will try to directly fetch learning_phase regardless of its state, by calling learning_phase() without checking if global_learning_phase_is_set() first. This unchecked learning_phase() call will set the state for learning_phase in the process.

>>> keras.backend.global_learning_phase_is_set()
False
>>> keras.backend.learning_phase()
0
>>> keras.backend.global_learning_phase_is_set()
True

But I did not call Dropout?

Here comes the final part, which is about the way Sequential adds layers to its stack. When you add the first layer that is not a keras tensor but has a known input shape, sequential will create an input keras tensor with that exact same shape and immediately call the layer on that tensor to obtain an output keras tensor. This is possible because input shape is already known.

>>> Sequential([RandomRotation(0.5)]).outputs is None
True
>>> Sequential([RandomRotation(0.5, input_shape=(2,2,1))]).outputs
[<KerasTensor: shape=(None, 2, 2, 1) dtype=float32 (created by layer 'random_rotation_7')>]

From there, each time you add another layer, the sequential model will check if it already has an output keras tensor (i.e., checking if the input shape is already known). If so, it will again immediately call the new layer on the current output tensor to obtain an updated one. Otherwise, the input shape is unknown, and the model defers the construction of output keras tensor until later when the model is called on actual input data.

>>> from tensorflow.keras.models import Sequential
>>> from tensorflow.keras.layers import RandomRotation, Dropout
>>> class DropoutWithCount(Dropout):
...     def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
...         super().__init__(rate, noise_shape, seed, **kwargs)
...         self.count = 0
...
...     def call(self, inputs, training=None):
...         self.count  = 1
...         print(f"Dropout called with training={training}, call counts = {self.count}")
...         return super().call(inputs, training)
...
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1)), DropoutWithCount(.25)])
Dropout called with training=None, call counts = 1
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1))])
>>> m1 = Sequential()
>>> m1.add(m)
>>> m1.add(DropoutWithCount(.25))
Dropout called with training=None, call counts = 1
>>> m = Sequential([RandomRotation(0.5), DropoutWithCount(.25)])
>>>

So yes, since the input shape is known, the Dropout layer will be called without any training argument by sequential as soon as it is added, which consequently sets the state for learning_phase.

What should I do?

Always pass training argument properly to your model/layer calls as explicit argument checking ranks highest in precedence. Otherwise, don't pass training to any calls but instead set global value learning_phase to either True or False via keras.backend.set_learning_phase(True/False) as this will take precedence over the default training values of the layers.

>>> from tensorflow.keras.models import Sequential
>>> from tensorflow.keras.layers import RandomRotation, Dropout
>>> import keras as keras
>>> import numpy as np
>>> img = np.array([[[[1],[2]],[[3],[4]]]])
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1))])
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.6862597],
         [3.3725195]],

        [[1.6274806],
         [3.3137403]]]], dtype=float32)>
>>> m1 = Sequential()
>>> m1.add(m)
>>> m1.add(Dropout(.25))
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.],
         [2.]],

        [[3.],
         [4.]]]], dtype=float32)>
>>> m(img, training=True)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.8427435],
         [3.685487 ]],

        [[1.314513 ],
         [3.1572566]]]], dtype=float32)>
>>> keras.backend.set_learning_phase(True)
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[3.3871531],
         [3.3064234]],

        [[1.6935766],
         [1.612847 ]]]], dtype=float32)>

CodePudding user response：

I would recommend taking a look at this post: Keras experimental RandomFlip and RandomRotation do not work with map. It may be better to use this DA as a layer or preprocessing and take out the Y rather than the layer itself. So for instance DA(X) is a one time run and then you use the output (rather than DA(X) itself) for your model(X).

Like:

model = DA
model.add(dropout=0.25)

or roughly

y = DA(X)  
z = model(y)

The training argument

Dropout and training=None

But I did not call Dropout?

What should I do?

The `training` argument

Dropout and `training=None`