A text TextVectorization
layer is used for word encoding, and the typical workflow calls the adapt()
method
Next, you will call adapt to fit the state of the preprocessing layer to the dataset. This will cause the model to build an index of strings to integers.
(https://www.tensorflow.org/tutorials/keras/text_classification)
or
If desired, the user can call this layer's adapt() method on a dataset. When this layer is adapted, it will analyze the dataset, determine the frequency of individual string values, and create a 'vocabulary' from them.
(https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization#adapt)
What is precisely the result of the adapt()
operation, and how to check concretely the content of the created vocabulary?
A small piece of my code
seq_length = 100
vocab_size=50000
vectorize_layer = TextVectorization(
max_tokens=vocab_size,
output_mode='int',
output_sequence_length=seq_length)
# build the vocabulary
vectorize_layer.adapt(text_ds)
CodePudding user response:
layer.get_vocabulary()
does this:
>>>data = tf.constant([["a", "c", "d"], ["d", "z", "b"]])
>>>layer = tf.keras.layers.StringLookup()
>>>layer.adapt(data)
>>>layer.get_vocabulary()
['[UNK]', 'd', 'z', 'c', 'b', 'a']
https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup