In the Tensorflow ML Basics with Keras tutorial for making a basic text classification, when preparing the trained model for export, the tutorial suggests including the TextVectorization layer into the Model so it can "process raw strings". I understand why to do this.
But then the code snippet is:
export_model = tf.keras.Sequential([
vectorize_layer,
model,
layers.Activation('sigmoid')
])
Why when preparing the model for export, does the tutorial also include a new activation layer layers.Activation('sigmoid')
? Why not incorporate this layer into the original model?
CodePudding user response:
Before the TextVectorization
layer was introduced, you had to manually edit your raw strings. This usually meant removing punctuation, lower case, tokenization and so forth:
#Raw String
"Furthermore, he asked himself why it happened to Billy?"
#Remove punctuation
"Furthermore he asked himself why it happened to Billy"
#Lower-case
"furthermore he asked himself why it happened to billy"
#Tokenize
['furthermore', 'he', 'asked', 'himself', 'why', 'it', 'happened', 'to', 'billy']
If you include the TextVectorization
layer in your model when you export, you can essentially feed raw strings into your model for prediction without having to clean them up first.
Regarding your second question: I also find it rather odd that the sigmoid
activation function was not used. I imagine that the last layer has a "linear activation function" due to the dataset and its samples. The samples can be split into two classes, solving a linearly separable problem.
The problem with a linear activation function during inference is that it can output negative values:
# With linear activation function
examples = [
"The movie was great!",
"The movie was okay.",
"The movie was terrible..."
]
export_model.predict(examples)
'''
array([[ 0.4543204 ],
[-0.26730654],
[-0.61234593]], dtype=float32)
'''
For example, the value -0.26730654
could indicate that the review "The movie was okay." is negative, but this is not necessarily the case. What one actually wants to predict is the probability that a particular sample belongs to a particular class. Therefore, a sigmoid function is used in the inference to squeeze the output values between 0 and 1. The output can then be interpreted as the probability that sample x
belongs to class n
:
# With sigmoid activation function
examples = [
"The movie was great!",
"The movie was okay.",
"The movie was terrible..."
]
export_model.predict(examples)
'''
array([[0.6116659 ],
[0.43356845],
[0.35152423]], dtype=float32)
'''
CodePudding user response:
Sometimes you want to know model's answer before sigmoid as it may contain useful information, for example, about distribution shape and its evolution. In such scenario it's convenient to have final scaling as a separate entity. Otherwise one would have to remove/add sigmoid layer - more lines of code, more possible erros. So it may be a good practice to apply sigmoid in the very end - just before saving/exporting. Or just an agreement.