Tensorflow gather, concatenate and then pad operation?-CodePudding

I have a 2D tensor in TensorFlow 2 (python). How can I pick-out and concatenate rows based on a ragged array of row indices and then pad shorter rows with zeros so that all rows end up with the same length?

Here is an example of what I have:

data = tf.constant([
            [300, 301, 302],
            [100, 101, 102],
            [200, 201, 202],
            [120, 121, 122],
            [210, 211, 212],
            [410, 411, 412],
            [110, 111, 112],
            [400, 401, 402],
        ], dtype=tf.float32)

row_ids = [ [ 1, 6, 3 ], [ 2, 4 ], [ 0 ], [ 7, 5] ]

And this is what I would like to get:

desired_result = tf.constant([
        [ 100, 101, 102, 110, 111, 112, 120, 121, 122],
        [ 200, 201, 202, 210, 211, 212,   0,   0,   0],
        [ 300, 301, 302,   0,   0,   0,   0,   0,   0],
        [ 400, 401, 402, 410, 411, 412,   0,   0,   0]
    ], 
    dtype=tf.float32
)

I have attempted to find a way with tf.RaggedTensor.from_value_rowids() and tf.gather_nd() with tf.concat() but without any success.

I do need to backpropagate through this operation and, therefore, I need to stick to TensorFlow 2 operations.

Any suggestions would be greatly appreciated! Thanks!

CodePudding user response：

IIUC, you can actually solve this task more simply:

import tensorflow as tf

data = tf.constant([
            [300, 301, 302],
            [100, 101, 102],
            [200, 201, 202],
            [120, 121, 122],
            [210, 211, 212],
            [410, 411, 412],
            [110, 111, 112],
            [400, 401, 402],
        ], dtype=tf.float32)

row_ids = tf.ragged.constant([ [ 1, 6, 3 ], [ 2, 4 ], [ 0 ], [ 7, 5] ])

t = tf.gather(data, row_ids).to_tensor()
t = tf.reshape(t, [tf.shape(t)[0], tf.reduce_prod(tf.shape(t)[1:])])

<tf.Tensor: shape=(4, 9), dtype=float32, numpy=
array([[100., 101., 102., 110., 111., 112., 120., 121., 122.],
       [200., 201., 202., 210., 211., 212.,   0.,   0.,   0.],
       [300., 301., 302.,   0.,   0.,   0.,   0.,   0.,   0.],
       [400., 401., 402., 410., 411., 412.,   0.,   0.,   0.]],
      dtype=float32)>

CodePudding user response：

I think I have found a solution that will work for me and hopefully others.

The idea is to:

add a "pad row" to the original data
extend the shorter indices arrays with the pad row number
use tf.gather_nd() to pick out rows
reshape the result to concatenate the inner dimensions

Here is the code:

# Add pad row
pad_row = tf.zeros(shape=[1, 3], dtype=tf.float32)
data_with_pad_row = tf.concat([data, pad_row], axis=0)
pad_row_no = data_with_pad_row.shape[0] - 1

# Extend indices
max_row_per_row = max([ len(rows_ids) for rows_ids in row_ids ])
new_row_ids = [ rows_ids   [ pad_row_no]*(max_row_per_row-len(rows_ids)) for rows_ids in row_ids ]
new_row_ids = [ [ [ row_id ] for row_id in rows_ids ] for rows_ids in new_row_ids ]

# Gather and reshape
g3d = tf.gather_nd(indices=new_row_ids, params=data_with_pad_row)
result = tf.reshape(g3d, [g3d.shape[0], g3d.shape[1]*g3d.shape[2]])

This gets the needed results and allows backpropagation through the operations.