I would like to implement this learning rate method as in the paper Attention is all you need. I have this code in Tensorflow, but I would like to implement it in Pytorch too. I know that Pytorch has modules for this (https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html), but how could I go about making a custom scheduler? Or perhaps one of the above lr_scheduler already fulfils the same function?
Tensorflow code:
class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, d_model, warmup_steps=4000):
super(CustomSchedule, self).__init__()
self.d_model = d_model
self.d_model = tf.cast(self.d_model, tf.float32)
self.warmup_steps = warmup_steps
def __call__(self, step):
arg1 = tf.math.rsqrt(step)
arg2 = step * (self.warmup_steps ** -1.5)
return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)
learning_rate = CustomSchedule(d_model)
optimizer = tf.keras.optimizers.Adam(learning_rate, beta_1=0.9, beta_2=0.98,
epsilon=1e-9)
Pytorch?
import torch
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)
scheduler =
CodePudding user response:
Since this is a scheduler used in a popular paper (Attention is all you need), reasonably good implementations already exist online.
You can grab a PyTorch implementation from this repository by @jadore801120.
Once you have it, then simply
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)
sched = ScheduledOptim(optimizer, d_model=..., n_warmup_steps=...)
also make sure to invoke the scheduler at the right time
for i, batch in enumerate(dataloader):
sched.zero_grad()
...
loss.backward()
sched.step_and_update_lr()