I mean, I get that he needs an ID to keep track of what he needs, like the last gradient for that variable and so on, but can't we just have an optimizer for a specific tensor?
a = tf.convert_to_tensor([1.])
with tf.GradientTape() as tape:
tape.watch(a)
loss = a**2
grad = tape.gradient(loss, a)
print(grad)
# <tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32)>
Thus we can calculate gradient of a tensor, but with that gradient we can't do anything, because it's not a Variable
, thus we can't just do the following:
K.optimizers.Adam().apply_gradients(zip(grad, a))
because we will get:
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_unique_id'
But we can, I mean, the optimizer is something like w = w - stepsize * grad
, we have w
, we have grad
, why can't we just do that inside an optimizer? Is there something that I can do to apply the formula in the Adam paper to w
without making it a tf.Variable
?
CodePudding user response:
I've coded up Adam optimizer from scratch without the use of variables, so that it's usable with general tensors
In case anybody needs it, it's the following:
class TensorAdamOptimizer:
def __init__(self, stepsize=1e-3, beta_1=0.9, beta_2=0.999, eps=1e-10):
self.stepsize = stepsize
self.beta_1 = beta_1
self.beta_2 = beta_2
self.eps = eps
self.time = 0
self.first_movement = None
self.second_movement = None
def init(self, shape):
self.first_movement = tf.zeros(shape)
self.second_movement = tf.zeros(shape)
def calculate_update(self, gradient):
if self.first_movement is None or self.second_movement is None:
self.init(tf.shape(gradient))
self.time = self.time 1
self.first_movement = self.beta_1 * self.first_movement (1 - self.beta_1) * gradient
self.second_movement = self.beta_2 * self.second_movement (1 - self.beta_1) * (gradient**2)
first_movement_corrected = self.first_movement / (1 - self.beta_1**self.time)
second_movement_corrected = self.second_movement / (1 - self.beta_2**self.time)
return self.stepsize * first_movement_corrected / (tf.sqrt(second_movement_corrected) self.eps)
def reset(self):
self.first_movement = tf.zeros_like(self.first_movement)
self.second_movement = tf.zeros_like(self.second_movement)