I created a Sequential preprocessing layer model like so:
import tensorflow.keras as keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dropout, RandomRotation
from tensorflow.keras.utils import set_random_seed; set_random_seed(72)
import matplotlib.pyplot as plt
(ax, ay), (qx, qy) = cifar10.load_data()
ay = keras.utils.to_categorical(ay, 10)
qy = keras.utils.to_categorical(qy, 10)
ax = ax.astype('float32'); ax /= 255;
qx = qx.astype('float32'); qx /= 255;
DA = Sequential([RandomRotation(180/360,fill_mode="nearest",interpolation="bilinear", input_shape=(32, 32, 3))])
I then printed the output of the first image using:
X=ax[0:1,:,:,:]
plt.imshow(X[0])
plt.show()
transformedX=DA(X).numpy()
plt.imshow(transformedX[0,:,:,:])
plt.show()
Result:
This is the expected output. The network added a random rotation to the image.
Then, I added the preprocessing model to another sequential model including nothing but it and a Dropout layer.
model = Sequential()
model.add(DA)
model.add(Dropout(0.25))
Finally, I printed the images again in the same way as before without using the new model at all:
X=ax[0:1,:,:,:]
plt.imshow(X[0])
plt.show()
transformedX=DA(X).numpy()
plt.imshow(transformedX[0,:,:,:])
plt.show()
Result:
I got this result both locally (in Spyder) and using Google Colab. Here's the notebook if you want to try it out.
From here, every other time you run the program, every image will look like the original. To get this result again, I need to Restart Runtime (in Google Colab), %reset
does not seem to work locally.
If I remove the input_shape=(32, 32, 3)
line from the preprocessing layer, the problem does not occur. However, I was under the impression this was necessary to include in the first layer of a model.
Is this a real bug or a problem in my code?
If it is a bug, is this particular of some outdated version of Keras or Tensorflow?
CodePudding user response:
The reason for this is threefold. It is related to
- How TF handle the
training
argument (or the lack thereof) passed to a layer call - How
Dropout
layer handletraining=None
- How TF constructs
Sequential
models
Note that my answer is based on TF v2.9.1.
The training
argument
Some layers, such as Dropout or RandomRotation, behave differently during training and inferencing. That's why at their base, layers always tries to identify if the call is made during training or not whenever they are called via ()
(syntactic sugar for __call__
). Internally, the training
flag is set to, in priority order,
training
argument with non-None value explicitly passed to the layer call e.g., when you call the layer aslayer(inputs, training=True/False)
training
argument determined by this very same 4-check procedure for its parent layer in a layer call chain.learning_phase
variable of the backend, only if that variable has been set. Checking the variable's state is done bykeras.backend.global_learning_phase_is_set()
and getting its value is done bykeras.backend.learning_phase()
.- Default value of
training
in this layercall
signature. Note thatcall ≠ __call__
. The former is a TF-defined method and the latter is one of many built-in magic methods in Python, although__call__
implementation of base layer eventually invokescall
at some point.
If none of the 4 checks yields a non-None value, then training=None
is used.
RandomRotation layer only rotates images if it sees training=True
. Your call to it fails the first three checks but pass the last thanks to training
being defaulted to True
in its call
signature. Thus, the layer sees training=True
and behaves as expected. However, as soon as you add Dropout, everything went south so what's happening?
Dropout and training=None
It turns out that a call to Dropout with eventual training=None
can actually set the state (but not the value) of the learning_phase
variable. This happens easily because unlike RandomRotation, Dropout has default training=None
which provides no guard for check 4.
>>> keras.backend.global_learning_phase_is_set()
False
>>> _ = tf.keras.layers.Dropout(.25)([1,2,3])
>>> keras.backend.global_learning_phase_is_set()
True
Once that happens, check 4 is essentially ignored for all subsequent calls to any layer: They will use learning_phase
(which defaults to 0) as training
whenever reaching check 3 and stop there. Your later calls to RandomRotation fell victim to this, thinking that they are made during inference and thus return the input as-is.
More precisely, as Dropout won't accept None for training
, it will try to directly fetch learning_phase
regardless of its state, by calling learning_phase()
without checking if global_learning_phase_is_set()
first. This unchecked learning_phase()
call will set the state for learning_phase
in the process.
>>> keras.backend.global_learning_phase_is_set()
False
>>> keras.backend.learning_phase()
0
>>> keras.backend.global_learning_phase_is_set()
True
But I did not call Dropout?
Here comes the final part, which is about the way Sequential
adds layers to its stack. When you add the first layer that is not a keras tensor but has a known input shape, sequential will create an input keras tensor with that exact same shape and immediately call the layer on that tensor to obtain an output keras tensor. This is possible because input shape is already known.
>>> Sequential([RandomRotation(0.5)]).outputs is None
True
>>> Sequential([RandomRotation(0.5, input_shape=(2,2,1))]).outputs
[<KerasTensor: shape=(None, 2, 2, 1) dtype=float32 (created by layer 'random_rotation_7')>]
From there, each time you add another layer, the sequential model will check if it already has an output keras tensor (i.e., checking if the input shape is already known). If so, it will again immediately call the new layer on the current output tensor to obtain an updated one. Otherwise, the input shape is unknown, and the model defers the construction of output keras tensor until later when the model is called on actual input data.
>>> from tensorflow.keras.models import Sequential
>>> from tensorflow.keras.layers import RandomRotation, Dropout
>>> class DropoutWithCount(Dropout):
... def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
... super().__init__(rate, noise_shape, seed, **kwargs)
... self.count = 0
...
... def call(self, inputs, training=None):
... self.count = 1
... print(f"Dropout called with training={training}, call counts = {self.count}")
... return super().call(inputs, training)
...
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1)), DropoutWithCount(.25)])
Dropout called with training=None, call counts = 1
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1))])
>>> m1 = Sequential()
>>> m1.add(m)
>>> m1.add(DropoutWithCount(.25))
Dropout called with training=None, call counts = 1
>>> m = Sequential([RandomRotation(0.5), DropoutWithCount(.25)])
>>>
So yes, since the input shape is known, the Dropout layer will be called without any training
argument by sequential as soon as it is added, which consequently sets the state for learning_phase
.
What should I do?
Always pass training
argument properly to your model/layer calls as explicit argument checking ranks highest in precedence. Otherwise, don't pass training
to any calls but instead set global value learning_phase
to either True or False via keras.backend.set_learning_phase(True/False)
as this will take precedence over the default training
values of the layers.
>>> from tensorflow.keras.models import Sequential
>>> from tensorflow.keras.layers import RandomRotation, Dropout
>>> import keras as keras
>>> import numpy as np
>>> img = np.array([[[[1],[2]],[[3],[4]]]])
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1))])
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.6862597],
[3.3725195]],
[[1.6274806],
[3.3137403]]]], dtype=float32)>
>>> m1 = Sequential()
>>> m1.add(m)
>>> m1.add(Dropout(.25))
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.],
[2.]],
[[3.],
[4.]]]], dtype=float32)>
>>> m(img, training=True)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.8427435],
[3.685487 ]],
[[1.314513 ],
[3.1572566]]]], dtype=float32)>
>>> keras.backend.set_learning_phase(True)
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[3.3871531],
[3.3064234]],
[[1.6935766],
[1.612847 ]]]], dtype=float32)>
CodePudding user response:
I would recommend taking a look at this post: Keras experimental RandomFlip and RandomRotation do not work with map. It may be better to use this DA as a layer or preprocessing and take out the Y rather than the layer itself. So for instance DA(X) is a one time run and then you use the output (rather than DA(X) itself) for your model(X).
Like:
model = DA
model.add(dropout=0.25)
or roughly
y = DA(X)
z = model(y)