Say I have a simple NN:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils import parameters_to_vector
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(1, 2)
self.fc2 = nn.Linear(2, 3)
self.fc3 = nn.Linear(3, 1)
def forward(self, x):
x = self.fc1(x)
x = torch.relu(x)
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Model()
opt = optim.Adam(net.parameters())
And also some features
features = torch.rand((3,1))
I can train it normally using:
for i in range(10):
opt.zero_grad()
out = net(features)
loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
loss.backward()
opt.step()
However, I'm interested in updating the weights of each layer after each example in the batch. That is, updating the actual weight values by some amount that will be different for each layer.
I can print the parameters of each layer with:
for i in range(1):
opt.zero_grad()
out = net(features)
print(parameters_to_vector(net.fc1.parameters()))
print(parameters_to_vector(net.fc2.parameters()))
print(parameters_to_vector(net.fc3.parameters()))
loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
loss.backward()
opt.step()
How can I change the values of the weights before the backprop without affecting the gradient?
Say that I want the layers weights' to be updated according to the following functions:
def first_layer_update(weight):
return weight 1e-3*weight
def second_layer_update(weight):
return 1e-2*weight
def third_layer_update(weight):
return weight - 1e-1*weight
CodePudding user response:
- Using the torch.no_grad
context manager.
This allows you to perform (in-place or out-of-place) operations on your tensors without Autograd keeping track of those changes. As @user3474165 explained:
def first_layer_update(weight):
with torch.no_grad():
return weight 1e-3*weight
def second_layer_update(weight):
with torch_no_grad():
return 1e-2*weight
def third_layer_update(weight):
with torch.no_grad():
return weight - 1e-1*weight
Or differently without altering your functions using the context manager when calling them:
with torch.no_grad():
first_layer_update(net.fc1.weight)
second_layer_update(net.fc2.weight)
third_layer_update(net.fc3.weight)
- Using the @torch.no_grad
decorator.
A variant is to use the @torch.no_grad
decorator:
@torch.no_grad()
def first_layer_update(weight):
return weight 1e-3*weight
@torch.no_grad():
def second_layer_update(weight):
return 1e-2*weight
@torch.no_grad():
def third_layer_update(weight):
return weight - 1e-1*weight
And call these with: first_layer_update(net.fc1.weight)
, second_layer_update(net.fc2.weight)
, etc...
- Mutating torch.Tensor.data
.
An alternative to wrapping your operations with the torch.no_grad
context is to mutate the weights using their data
attribute. This means calling your functions with:
>>> first_layer_update(net.fc1.weight.data)
>>> second_layer_update(net.fc2.weight.data)
>>> third_layer_update(net.fc3.weight.data)
Which would mutate the weights (not the biases) of the three layers with their respective update policies.
In a nutshell, if you want to mutate all parameters of a nn.Module
you can either do:
>>> with torch.no_grad():
... update_policy(parameters_to_vector(net.layer.parameters()))
Or
>>> update_policy(parameters_to_vector(net.layer.parameters().data))
CodePudding user response:
From the pytorch docs, you're basically on the right track. You can loop over all the parameters in each layer and then add to them directly
with torch.no_grad():
for param in layer.parameters():
param.weight = 1e-3 # or whatever