How to write custom text pre-processing that could be saved as part of a model?
Suppose that I would like to have two features:
- auto-correct string input with some function. Words might change after this operation
- do query expansion of string input, such that outcome text/tokens might contain few additional words(for which weights would be trained).
Something like this:
fli to London -> Fly to London
fly to London -> Fly to London loc_city
-> this token would need to be in vocabulary in advance, which could be done
After steps 1 and/or 2, feed the result to TextVectorisation / Embedding layer ?
There is standardize
callback, but I do not see obvious way of doing that with existing tf.string operations.
Ideally, there is a callback function / layer which accepts string(or tokens) and maps to another string(or string tokens).
CodePudding user response:
You can get the first character of a string like this:
import tensorflow as tf
class StringLayer(tf.keras.layers.Layer):
def __init__(self):
super(StringLayer, self).__init__()
def call(self, inputs):
return tf.squeeze(tf.strings.bytes_split(inputs), axis=1).to_tensor()[:, 0]
s = tf.constant([['next_string'], ['some_string']])
layer = StringLayer()
print(layer(s))
# tf.Tensor([b'n' b's'], shape=(2,), dtype=string)