Generate smaller tensors for each sequence of values-CodePudding

Consider this tensor.

a = tf.constant([0,1,2,3,5,6,7,8,9,10,19,20,21,22,23,24])

I want to divide it into 3 tensors (for this specific example), containing the groups where the numbers are immediately adjacent. The expected output would be:

output_tensor = [ [0,1,2,3], [5,6,7,8,9,10], [19,20,21,22,23,24] ]

Any idea on how to do this? Is there a tensor flow .math method that can help doing this efficiently? I couldn't find anything.

CodePudding user response：

For the example provided, split should work:

    a = tf.constant([0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 19, 20, 21, 22, 23, 24])
    print(tf.split(a, [4, 6, 6]))

Output:

[<tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3], dtype=int32)>, <tf.Tensor: shape=(6,), dtype=int32, numpy=array([ 5,  6,  7,  8,  9, 10], dtype=int32)>, <tf.Tensor: shape=(6,), dtype=int32, numpy=array([19, 20, 21, 22, 23, 24], dtype=int32)>]

The second argument dictates the size of each output tensor along the splitted axis (by default, 0) - so in this case the first tensor is of size 4, the second tensor is of size 6, and the third tensor is of size 6. Alternatively, an int can be provided, as long as the size of the tensor on the axis your splitting on is evenly divisible by that value. In this case, 3 would not work (16/3 = 5.3333), but 4 would:

    a = tf.constant([0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 19, 20, 21, 22, 23, 24])
    print(tf.split(a, 4))

Output:

[<tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3], dtype=int32)>, <tf.Tensor: shape=(4,), dtype=int32, numpy=array([5, 6, 7, 8], dtype=int32)>, <tf.Tensor: shape=(4,), dtype=int32, numpy=array([ 9, 10, 19, 20], dtype=int32)>, <tf.Tensor: shape=(4,), dtype=int32, numpy=array([21, 22, 23, 24], dtype=int32)>]

Assuming the delineations of where numbers are continuous are unknown, the indices can be computed efficiently using adjacent differences and supplied to tf.split:

def compute_split_indices(x):
    adjacent_diffs = x[1:] - x[:-1]  # compute adjacent differences
    indices_where_not_continuous = tf.where(adjacent_diffs > 1)   1
    splits = tf.concat([indices_where_not_continuous[:1], indices_where_not_continuous[1:] -
                        indices_where_not_continuous[:-1]], axis=0)  # compute split sizes from the indices
    splits_as_ints = [split.numpy().tolist()[0] for split in splits]  # convert to a list of integers for ease of use
    final_split_sizes = splits_as_ints   [len(x) - sum(splits_as_ints)]  # account for the rest of the tensor
    return final_split_sizes

if __name__ == "__main__":
    a = tf.constant([0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 19, 20, 21, 22, 23, 24])
    splits = compute_split_indices(a)
    print(tf.split(a, splits))

Output:


[<tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3], dtype=int32)>, <tf.Tensor: shape=(6,), dtype=int32, numpy=array([ 5,  6,  7,  8,  9, 10], dtype=int32)>, <tf.Tensor: shape=(6,), dtype=int32, numpy=array([19, 20, 21, 22, 23, 24], dtype=int32)>]

Notice the output is the same as when we explicitly supplied [4, 6, 6].