Home > Back-end >  Adam optimizer between Tf 1 and Tf 2
Adam optimizer between Tf 1 and Tf 2

Time:12-21

I am trying to replicate the same result between Tf1 and Tf2. In below, there is a simple example using Adam optimizer.

Here in TF2:

x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.5, epsilon=1e-08)
optimizer.apply_gradients(zip([grad], [x]))
print(x)

x is: <tf.Variable 'Variable:0' shape=(3,) dtype=float32, numpy=array([0.49998665, 1.4999859 , 2.4999857 ], dtype=float32)>

While in TF1:

x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.5)
optimizer.apply_gradients(zip([grad], [x]))

init_op = tf.initialize_all_variables()
with tf.Session() as sess:
  sess.run(init_op)
  print(sess.run(x))

x is: [1. 2. 3.]

Does anyone know what causes inconsistencies between Tf1 and Tf2 when using Adam Optimizer? I do not exclude the possibility of a wrong implementation.

I would appreciate it a lot if anyone could tell me what I am doing wrong in TF1 that I cannot get the same result as in TF2.

Many thanks!

CodePudding user response:

If you instead do this:

x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.5)
step = optimizer.apply_gradients(zip([grad], [x]))

init_op = tf.initialize_all_variables()
with tf.Session() as sess:
  sess.run(init_op)
  sess.run(step)
  print(x.eval())

You get the same result (barring what I think could be floating point inaccuracies).

[0.50000155 1.5000007  2.5000005 ]

CodePudding user response:

Reproducibility is a tricky yet crucial step in commercial AI/ML projects.

Heres the v1 implementation of Adam on GH: https://github.com/tensorflow/tensorflow/blob/4c081973a6374ce867794ad66a5c4b204c310afb/tensorflow/python/keras/optimizer_v1.py#L468

And here’s the v2 one: https://github.com/keras-team/keras/blob/v2.7.0/keras/optimizer_v2/adam.py

They are implemented slightly differently. I found this in the V2 documentation: Many optimizer subclasses, such as Adam and Adagrad allocate and manage additional variables associated with the variables to train. These are called Slots. Slots have names and you can ask the optimizer for the names of the slots that it uses. Once you have a slot name you can ask the optimizer for the variable it created to hold the slot value.

Also if you’re trying to migrate code from 1 to 2 you can do it automatically as per https://www.tensorflow.org/guide/migrate.

  • Related