Home > database >  How to randomly remove spaces from a tensor of strings in Tensorflow
How to randomly remove spaces from a tensor of strings in Tensorflow

Time:02-22

There is a Tensor of strings, for each string I want to remove n spaces randomly where n <= the number of the spaces in that string.

import tensorflow as tf

strings_with_spaces = tf.constant([b'A B C D E F', b'1 2 3 4 5 6'])
remove_spaces_randomly = None
T = tf.map_fn(remove_spaces_randomly, strings_with_spaces)

I expect the output to be something like:
['A BC D EF', '1 2 34 5 6']

CodePudding user response:

You can try something like this:

import tensorflow as tf

strings_with_spaces = tf.constant([b'A B C D E F', b'1 2 3 4 5 6'])

def remove_spaces_randomly(x, n):
  split_string = tf.strings.unicode_split(x, 'UTF-8')
  indices = tf.where(tf.equal(split_string, ' '))
  n = tf.cond(tf.less_equal(n, tf.shape(indices)[0]), lambda: n, lambda: tf.shape(indices)[0])
  output = tf.tensor_scatter_nd_update(split_string, tf.random.shuffle(indices)[:n], tf.repeat([''], repeats=n))
  return tf.strings.join(output)

n = 3
T = tf.map_fn(lambda x: remove_spaces_randomly(x, n), strings_with_spaces)
print(T)
tf.Tensor([b'A BCDE F' b'1 23 456'], shape=(2,), dtype=string)
  • Related