Custom text pre-processing saved in Tensorflow model-CodePudding

How to write custom text pre-processing that could be saved as part of a model?

Suppose that I would like to have two features:

auto-correct string input with some function. Words might change after this operation
do query expansion of string input, such that outcome text/tokens might contain few additional words(for which weights would be trained).

Something like this:

fli to London -> Fly to London
fly to London -> Fly to London loc_city

-> this token would need to be in vocabulary in advance, which could be done

After steps 1 and/or 2, feed the result to TextVectorisation / Embedding layer ?

There is standardize callback, but I do not see obvious way of doing that with existing tf.string operations.

Ideally, there is a callback function / layer which accepts string(or tokens) and maps to another string(or string tokens).

CodePudding user response：

You can get the first character of a string like this:

import tensorflow as tf

class StringLayer(tf.keras.layers.Layer):
  def __init__(self):
    super(StringLayer, self).__init__()

  def call(self, inputs):
    return tf.squeeze(tf.strings.bytes_split(inputs), axis=1).to_tensor()[:, 0]

s = tf.constant([['next_string'], ['some_string']])
layer = StringLayer()
print(layer(s))
# tf.Tensor([b'n' b's'], shape=(2,), dtype=string)