Home > Back-end >  How to mask elements with padding in tensorflow?
How to mask elements with padding in tensorflow?

Time:12-28

Given two example tensors input and mask:

>>> input = tf.random.normal([2,3,5])
>>> input
<tf.Tensor: shape=(2, 3, 5), dtype=float32, numpy=
array([[[ 1.1260294 , -0.05932725,  0.85893923, -1.5332409 ,
          0.6681451 ],
        [ 0.8833729 ,  0.8421117 , -0.60990584,  0.08593109,
          0.5969471 ],
        [ 0.20015325, -0.9459327 , -1.0818844 , -1.7254639 ,
         -0.51545954]],

       [[-0.36073774, -0.24315724,  1.5217028 ,  1.5075827 ,
          0.05745999],
        [-0.2570101 ,  1.5501927 , -0.17113225,  0.16063859,
         -0.95638955],
        [ 0.48955616,  0.11943919, -0.3523262 ,  0.10750653,
          1.1027677 ]]], dtype=float32)>

>>> mask = tf.constant([[0,1,0],[1,0,1]])
>>> mask
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[0, 1, 0],
       [1, 0, 1]], dtype=int32)>

I need to mask out input according to mask where values are 0. However, since the number of masked out elements for each example in the batch input might be different, to keep the output a valid tensor, the output should be:

>>> masked_input
<tf.Tensor: shape=(2, 3, 5), dtype=float32, numpy=
array([[[ 0.8833729 ,  0.8421117 , -0.60990584,  0.08593109,
          0.5969471 ],
        [ 0 , 0 , 0 , 0 ,
         0],
        [ 0 , 0 , 0 , 0 ,
         0]],

       [[-0.36073774, -0.24315724,  1.5217028 ,  1.5075827 ,
          0.05745999],
        [ 0.48955616,  0.11943919, -0.3523262 ,  0.10750653,
          1.1027677 ],
        [ 0 , 0 , 0 , 0 ,
          0]]], dtype=float32)>

i.e. in the output, the masked input keeps only elements where mask is 1, and, with zero-padding at the end to ensure that the output is a valid tensor.

I've searched around and tried using:

  1. tf.gather, however, still can't figure out how to proceed.
  2. tf.boolean_mask, however, it doesn't support masking but just drops the first (zeroth) dimension, as shown below:
>>> tf.boolean_mask(input, mask)
<tf.Tensor: shape=(3, 5), dtype=float32, numpy=
array([[ 0.8833729 ,  0.8421117 , -0.60990584,  0.08593109,  0.5969471 ],
       [-0.36073774, -0.24315724,  1.5217028 ,  1.5075827 ,  0.05745999],
       [ 0.48955616,  0.11943919, -0.3523262 ,  0.10750653,  1.1027677 ]],
      dtype=float32)>
  1. tf.ragged.boolean_mask, this is by far the closest one to what I want, it keeps the dimension, however, still doesn't support masking, so the result is a ragged tensor...

Similar issues are mentioned in GitHub: https://github.com/tensorflow/tensorflow/issues/18238

In short:

tensor = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
mask = np.array([[True, False, True], [False, False, True], [True, True, True]])
boolean_mask(tensor, mask, keepdims=False) # [1, 3, 6, 7, 8, 9]
boolean_mask(tensor, mask, keepdims=True, pad_val=0) # [[1, 3, 0], [6, 0, 0], [7, 8, 9]] 

CodePudding user response:

You can do by,

inputs * tf.cast(mask[...,None], tf.float32)

CodePudding user response:

To mask out elements in input and add padding to the resulting tensor so that it is a valid tensor, tf.sequence_mask function can be used to create a boolean mask that indicates which elements should be kept, and then tf.boolean_mask can be used to apply the mask to input and get a tensor with the masked out elements removed.

# Create a boolean mask that indicates which elements in `input` should be kept
mask = tf.sequence_mask(mask, maxlen=tf.shape(input)[-1])

# Mask out the elements in `input` that are not marked as 1 in `mask`
masked_input = tf.boolean_mask(input, mask)

# Reshape the masked input tensor to have the same shape as `input`
masked_input = tf.reshape(masked_input, tf.shape(input))

print(masked_input)

This should output a tensor with the same shape as input, but with the elements that were marked as 0 in mask removed and padded with zeros at the end to ensure that the output is a valid tensor. hope it helps you.

CodePudding user response:

o mask out elements in input and add padding to the resulting tensor so that it is a valid tensor, tf.sequence_mask function can be used to create a boolean mask that indicates which elements should be kept, and then tf.boolean_mask can be used to apply the mask to input and get a tensor with the masked out elements removed.

Create a boolean mask that indicates which elements in input should be kept

mask = tf.sequence_mask(mask, maxlen=tf.shape(input)[-1])

Mask out the elements in input that are not marked as 1 in mask

masked_input = tf.boolean_mask(input, mask)

Reshape the masked input tensor to have the same shape as input

masked_input = tf.reshape(masked_input, tf.shape(input))

print(masked_input) This should output a tensor with the same shape as input, but with the elements that were marked as 0 in mask removed and padded with zeros at the end to ensure that the output is a valid tensor. hope it helps you.

  • Related