I was trying to understand how K.layers.Dropout
might be implemented, since in the literature it's always referred as a random independent sampling of 0/1 masks for each element.
Given that the literature it's pretty clear to me, I switched to coding it, and I stumbled upon an issue: since TF uses Graphs, we don't know the sample size, in particular:
def class CustomLayer(K.keras.Layer)
def call(inputs):
tf.print(inputs.shape)
will indeed print (supposing the eager evaluation is turned off) None
as first dimension
Having said that, how is TF able to sample an independent mask for each sample in each minibatch?
At the moment my best guess is that they are using something like tf.vectorized_map
to get the performance they are getting with a random mask for each element in the minibatch
CodePudding user response:
I traced the code for tf.keras.layers.Dropout.call
in an effort to answer the following question (tensorflow 2.9):
how is TF able to sample an independent mask for each sample in each minibatch?
In summary, a random uniform distribution is sampled from [0, 1) with the same shape as the input (including batch dimension). This allows the method to use an independent mask for each sample. The noise array is then made into a boolean mask based on the dropout rate. This is all assuming that one keeps noise_shape=None
when instantiating the Dropout
layer.
I have copied the relevant lines below.
noise_shape = _get_noise_shape(x, noise_shape)
# Sample a uniform distribution on [0.0, 1.0) and select values larger
# than or equal to `rate`.
random_tensor = uniform_sampler(shape=noise_shape, dtype=x_dtype)
keep_mask = random_tensor >= rate
ret = gen_math_ops.mul(ret, gen_math_ops.cast(keep_mask, x_dtype))
In the case that noise_shape=None
in the Dropout
layer, _get_noise_shape
will return the shape of the input x
. This is done with the graph-compatible method tf.shape
, which evaluates the shape of the tensor at runtime.
Here is an overview of the process for the TensorFlow / Keras v2 API.
- Instantiate
tf.keras.layers.Dropout
layer (withnoise_shape=None
). - Call the
Dropout.call
instance method on an inputx
. - Call
self._random_generator.dropout
, which callsBaseRandomLayer._random_generator.dropout
, which callstf.nn.experimental.stateless_dropout
- There is conditional logic in
BaseRandomLayer._random_generator.dropout
: v2 api will usestateless_dropout
and v1 api will usetf.nn.dropout
.
- There is conditional logic in
- Call private method
_dropout
, which then constructs the noise array to be the same shape as the input tensorx
. - Apply the noise array to the input, and return the result.