From TF documentation: "one_hot": Encodes each individual element in the input into an array the same size as the vocabulary.
alphabet = set("abcdefghijklmnopqrstuvwxyz")
one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), output_mode='one_hot')
print(len(alphabet)) #26
print(one_hot_encoder("a").shape) #(27,)
As far as I understand it it should encode to a 26 shaped tensor. Why does it encode to a 27 shaped one? Should there be an extra label to represent "no class"?
CodePudding user response:
The position 0 is reserved for the OOV
token (out of vocabulary). If you don't want that, you can set num_oov_indices
to zero:
one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), num_oov_indices=0, output_mode='one_hot')