Home > Enterprise >  Why is StringLookup from producing an extra label?
Why is StringLookup from producing an extra label?

Time:09-27

From TF documentation: "one_hot": Encodes each individual element in the input into an array the same size as the vocabulary.

alphabet = set("abcdefghijklmnopqrstuvwxyz")
one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), output_mode='one_hot')
print(len(alphabet)) #26
print(one_hot_encoder("a").shape) #(27,)

As far as I understand it it should encode to a 26 shaped tensor. Why does it encode to a 27 shaped one? Should there be an extra label to represent "no class"?

CodePudding user response:

The position 0 is reserved for the OOV token (out of vocabulary). If you don't want that, you can set num_oov_indices to zero:

one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), num_oov_indices=0, output_mode='one_hot')
  • Related