I have a question about tf.keras.constraints method.
(1)
class WeightsSumOne(tf.keras.constraints.Constraint):
def __call__(self, w):
return tf.nn.softmax(w, axis=0)
output = layers.Dense(1, use_bias=False,
kernel_constraint = WeightsSumOne())(input)
(2)
intermediate = layers.Dense(1, use_bias = False)
intermediate.set_weights(tf.nn.softmax(intermediate.get_weights(), axis=0))
Do (1) and (2) perform the same process?
The reason why I ask the question is that Keras Documentation said that
They are per-variable projection functions applied to the target variable after each gradient update (when using fit()). (https://keras.io/api/layers/constraints/)
Unlike (1), I think that the constraint is applied before each gradient update in case of (2).
In my opinion, the gradients of weights of (1) and (2) are different, because the softmax is applied before the gradient calculation in the second case, but after the gradient calculation in the first case.
If I am wrong, I would appreciate it if you point out the wrong part.
CodePudding user response:
They are not the same.
In the first case, the constraint is applied to the weights
but in the second case its on the output of the dense
layer (after multiplying with the inputs).
Construct a model in the first case:
inp = keras.Input(shape=(3,5))
out = keras.layers.Dense(1, use_bias=False, kernel_initializer=tf.ones_initializer(),
kernel_constraint= WeightsSumOne())(inp)
model = keras.Model(inp, out)
model.compile('adam', 'mse')
dummy run,
inputs = tf.random.normal(shape=(1,3,5))
outputs = tf.random.normal(shape=(1,3,1))
model.fit(inputs,outputs, epochs=1)
check the layer weights of model
print(model.layers[1].get_weights()[0])
#outputs
array([[0.2],
[0.2],
[0.2],
[0.2],
[0.2]]
Construct the model in the second case
inp = keras.Input(shape=(3,5))
out = keras.layers.Dense(1, activation='softmax', use_bias=False,
kernel_initializer=tf.ones_initializer())(inp)
model1 = keras.Model(inp, out)
model1.compile('adam', 'mse')
#dummy run
model1.fit(inputs,outputs, epochs=1)
check the layer weights of model1,
print(model1.layers[1].get_weights()[0])
#outputs
array([[1.],
[1.],
[1.],
[1.],
[1.]],
We can see the layer weight of model
is softmax of layer weight of model1