Keras won't broadcast-multiply the model output with a mask designed for the entire mini batch-CodePudding

I have a data generator that produces batches of input data (X) and targets (Y), and also a mask (batch_mask) to be applied to the model output (the same mask applies to all the datapoint in the batch; there are different masks for different batches and the data generator takes care of doing this).

As a result, the first dimension of batch_mask could have shape 1 or batch_size (by repeating the same mask along the first dimension batch_size times). I was expecting Keras to let me use either, and I wanted to simply create masks having a shape of 1 on the first dimension.

However, when I tried this, I got the error:

ValueError: Data cardinality is ambiguous:
  x sizes: 128, 1
  y sizes: 128
Make sure all arrays contain the same number of samples.

Why won't Keras broadcast along the first dimension? It seems like this should not be complicated.

Here's some minimal example code to observe this behavior

import tensorflow.keras as tfk
import numpy as np

#######################
# 1. model definition #
#######################

# model parameters
nfeatures_in = 6
target_size = 8

# model inputs
input = tfk.layers.Input(nfeatures_in)
input_mask = tfk.layers.Input(target_size)

# model graph
out = tfk.layers.Dense(target_size)(input)
out_masked = tfk.layers.Multiply()((out,input_mask)) # multiply all model outputs in the batch by the same mask
model = tfk.Model(inputs=(input, input_mask), outputs=out_masked)

##########################
# 2. dummy data creation #
##########################

batch_size = 32

# create masks the batch
zeros_vector = np.zeros((1,target_size)) # "batch_size"==1
zeros_vector[0,:6] = 1
batch_mask = zeros_vector

# dummy data creation
X = np.random.randn(batch_size, 6)
Y = np.random.randn(batch_size, target_size)*batch_mask # the target is masked by design in each batch


############################
# 3. compile model and fit #
############################

model.compile(optimizer="Adam", loss="mse")
model.fit((X, batch_mask),Y, batch_size=batch_size)

I know I could make this work by either:

repeating the mask to make the first dimension of batch_mask be the size of the first dimension of X (instead of 1).
using pure tensorflow (but I feel like broadcasting along the batch dimension should not be a problem for Keras).

How can I make this work with Keras?

Thank you!

CodePudding user response：

You can create an IdentityLayer which receives as an external input parameter the batch_mask and returns it as a tensor.

class IdentityLayer(tfk.layers.Layer):
    def __init__(self, my_mask, **kwargs):
        super(IdentityLayer, self).__init__()
        self.my_mask = my_mask
    def call(self, _):
        my_mask = tf.convert_to_tensor(self.my_mask, dtype=tf.float32)
        return my_mask
    def get_config(self):
        config = super().get_config()
        config.update({
            "my_mask": self.my_mask,
        })
        return config

The usage of IdentityLayer in a model is straightforward:

# model inputs
input = tfk.layers.Input(nfeatures_in)
input_mask = IdentityLayer(batch_mask)(input) 

# model graph
out = tfk.layers.Dense(target_size)(input)
out_masked = tfk.layers.Multiply()((out,input_mask)) 
model = tfk.Model(inputs=input, outputs=out_masked)

Where batch_mask is a numpy array created as you reported:

zeros_vector = np.zeros((1,target_size)) # "batch_size"==1
zeros_vector[0,:6] = 1
batch_mask = zeros_vector