I have a 2D tensor in TensorFlow 2 (python). How can I pick-out and concatenate rows based on a ragged array of row indices and then pad shorter rows with zeros so that all rows end up with the same length?
Here is an example of what I have:
data = tf.constant([
[300, 301, 302],
[100, 101, 102],
[200, 201, 202],
[120, 121, 122],
[210, 211, 212],
[410, 411, 412],
[110, 111, 112],
[400, 401, 402],
], dtype=tf.float32)
row_ids = [ [ 1, 6, 3 ], [ 2, 4 ], [ 0 ], [ 7, 5] ]
And this is what I would like to get:
desired_result = tf.constant([
[ 100, 101, 102, 110, 111, 112, 120, 121, 122],
[ 200, 201, 202, 210, 211, 212, 0, 0, 0],
[ 300, 301, 302, 0, 0, 0, 0, 0, 0],
[ 400, 401, 402, 410, 411, 412, 0, 0, 0]
],
dtype=tf.float32
)
I have attempted to find a way with tf.RaggedTensor.from_value_rowids()
and tf.gather_nd()
with tf.concat()
but without any success.
I do need to backpropagate through this operation and, therefore, I need to stick to TensorFlow 2 operations.
Any suggestions would be greatly appreciated! Thanks!
CodePudding user response:
IIUC, you can actually solve this task more simply:
import tensorflow as tf
data = tf.constant([
[300, 301, 302],
[100, 101, 102],
[200, 201, 202],
[120, 121, 122],
[210, 211, 212],
[410, 411, 412],
[110, 111, 112],
[400, 401, 402],
], dtype=tf.float32)
row_ids = tf.ragged.constant([ [ 1, 6, 3 ], [ 2, 4 ], [ 0 ], [ 7, 5] ])
t = tf.gather(data, row_ids).to_tensor()
t = tf.reshape(t, [tf.shape(t)[0], tf.reduce_prod(tf.shape(t)[1:])])
<tf.Tensor: shape=(4, 9), dtype=float32, numpy=
array([[100., 101., 102., 110., 111., 112., 120., 121., 122.],
[200., 201., 202., 210., 211., 212., 0., 0., 0.],
[300., 301., 302., 0., 0., 0., 0., 0., 0.],
[400., 401., 402., 410., 411., 412., 0., 0., 0.]],
dtype=float32)>
CodePudding user response:
I think I have found a solution that will work for me and hopefully others.
The idea is to:
- add a "pad row" to the original data
- extend the shorter indices arrays with the pad row number
- use tf.gather_nd() to pick out rows
- reshape the result to concatenate the inner dimensions
Here is the code:
# Add pad row
pad_row = tf.zeros(shape=[1, 3], dtype=tf.float32)
data_with_pad_row = tf.concat([data, pad_row], axis=0)
pad_row_no = data_with_pad_row.shape[0] - 1
# Extend indices
max_row_per_row = max([ len(rows_ids) for rows_ids in row_ids ])
new_row_ids = [ rows_ids [ pad_row_no]*(max_row_per_row-len(rows_ids)) for rows_ids in row_ids ]
new_row_ids = [ [ [ row_id ] for row_id in rows_ids ] for rows_ids in new_row_ids ]
# Gather and reshape
g3d = tf.gather_nd(indices=new_row_ids, params=data_with_pad_row)
result = tf.reshape(g3d, [g3d.shape[0], g3d.shape[1]*g3d.shape[2]])
This gets the needed results and allows backpropagation through the operations.