How is Keras Dropout actually perfomed?-CodePudding

I was trying to understand how K.layers.Dropout might be implemented, since in the literature it's always referred as a random independent sampling of 0/1 masks for each element.

Given that the literature it's pretty clear to me, I switched to coding it, and I stumbled upon an issue: since TF uses Graphs, we don't know the sample size, in particular:

def class CustomLayer(K.keras.Layer)
  def call(inputs):
    tf.print(inputs.shape)

will indeed print (supposing the eager evaluation is turned off) None as first dimension

Having said that, how is TF able to sample an independent mask for each sample in each minibatch?

At the moment my best guess is that they are using something like tf.vectorized_map to get the performance they are getting with a random mask for each element in the minibatch

CodePudding user response：

I traced the code for tf.keras.layers.Dropout.call in an effort to answer the following question (tensorflow 2.9):

how is TF able to sample an independent mask for each sample in each minibatch?

In summary, a random uniform distribution is sampled from [0, 1) with the same shape as the input (including batch dimension). This allows the method to use an independent mask for each sample. The noise array is then made into a boolean mask based on the dropout rate. This is all assuming that one keeps noise_shape=None when instantiating the Dropout layer.

I have copied the relevant lines below.

noise_shape = _get_noise_shape(x, noise_shape)
# Sample a uniform distribution on [0.0, 1.0) and select values larger
# than or equal to `rate`.
random_tensor = uniform_sampler(shape=noise_shape, dtype=x_dtype)
keep_mask = random_tensor >= rate
ret = gen_math_ops.mul(ret, gen_math_ops.cast(keep_mask, x_dtype))

In the case that noise_shape=None in the Dropout layer, _get_noise_shape will return the shape of the input x. This is done with the graph-compatible method tf.shape, which evaluates the shape of the tensor at runtime.

Here is an overview of the process for the TensorFlow / Keras v2 API.

Instantiate tf.keras.layers.Dropout layer (with noise_shape=None).
Call the Dropout.call instance method on an input x.
Call self._random_generator.dropout, which calls BaseRandomLayer._random_generator.dropout, which calls tf.nn.experimental.stateless_dropout
- There is conditional logic in BaseRandomLayer._random_generator.dropout: v2 api will use stateless_dropout and v1 api will use tf.nn.dropout.
Call private method _dropout, which then constructs the noise array to be the same shape as the input tensor x.
Apply the noise array to the input, and return the result.