Here's my custom softplus
activation:
def my_softplus(z):
return tf.math.log(tf.exp(tf.cast(z,tf.float32)) 1)
If I run a small test:
my_softplus([-3.0, -1.0, 0.0, 2.0])
it returns
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858733, 0.31326166, 0.6931472 , 2.126928])>
When I run tensorflow own softplus activation function:
tf.keras.activations.softplus([-3.0, -1.0, 0.0, 2.0])
I got
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.04858736, 0.31326172, 0.6931472 , 2.126928 ], dtype=float32)>
Very similar results, except for the last digits which are different.
When I fit the following model on a subset of the mnist dataset,
model2=models.Sequential()
model2.add(layers.Flatten(input_shape=(28,28)))
model2.add(layers.Dense(16, activation="softplus",#"softplus",# my_softplus <- this activation
kernel_initializer=my_glorot_initializer,
kernel_regularizer=my_l1_regularizer,
#kernel_constraint=my_positive_weights
))
model2.add(layers.Dense(16, activation="relu"))
model2.add(layers.Dense(10,activation="softmax"))
model2.compile(optimizer="rmsprop",loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=["accuracy"])
The fitting returns something like
Epoch 1/20
20/20 - 2s - loss: -2.9399e-01 - accuracy: 0.1064 - val_loss: -2.1013e-01 - val_accuracy: 0.1136
Epoch 2/20
20/20 - 1s - loss: -9.9094e-02 - accuracy: 0.1064 - val_loss: 0.0140 - val_accuracy: 0.1136
However, when I use my my_softplus
activation function, I get NaN for losses.
Why is that?
Note: You can comment out the kernel_initializer
and kernel_regularizer
in the model building, that the results will be similar.
Note2: Here's a link for GoogleColab notebook with a MWE.
CodePudding user response:
In Colab, you did not normalize the data:
#creating a validation set
x_val=x_train[:50000]
partial_x_train=x_train[50000:]
y_val=y_train[:50000]
partial_y_train=y_train[50000:]
So the network had to go over very large values that yielded NaN loss.
Example (your implementation):
def my_softplus(z):
return tf.math.log(tf.exp(tf.cast(z, tf.float32)) 1)
my_softplus(100)
>> <tf.Tensor: shape=(), dtype=float32, numpy=inf>
When you call softplus
(by TF) as an activation in the dense layer, it will check underflow and overflow issues.
In your question, if you want to get similar results, you need to normalize the data.
Source code of Softplus
: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/softplus_op.h#L31-L58
In case the link changes, I'll copy it here.
template <typename Device, typename T>
struct Softplus {
// Computes Softplus activation.
//
// features: any shape.
// activations: same shape as "features".
void operator()(const Device& d, typename TTypes<T>::ConstTensor features,
typename TTypes<T>::Tensor activations) {
// Choose a threshold on x below which exp(x) may underflow
// when added to 1, but for which exp(x) is always within epsilon of the
// true softplus(x). Offset of 2 from machine epsilon checked
// experimentally for float16, float32, float64. Checked against
// softplus implemented with numpy's log1p and numpy's logaddexp.
static const T threshold =
Eigen::numext::log(Eigen::NumTraits<T>::epsilon()) T(2);
// Value above which exp(x) may overflow, but softplus(x) == x
// is within machine epsilon.
auto too_large = features > features.constant(-threshold);
// Value below which exp(x) may underflow, but softplus(x) == exp(x)
// is within machine epsilon.
auto too_small = features < features.constant(threshold);
auto features_exp = features.exp();
activations.device(d) = too_large.select(
features, // softplus(x) ~= x for x large
too_small.select(features_exp, // softplus(x) ~= exp(x) for x small
features_exp.log1p()));
}
};