How can I change the NN weights without affecting the gradients?-CodePudding

Say I have a simple NN:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils import parameters_to_vector

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(1, 2)
        self.fc2 = nn.Linear(2, 3)
        self.fc3 = nn.Linear(3, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = torch.relu(x)        
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Model()

opt = optim.Adam(net.parameters())

And also some features

features = torch.rand((3,1))

I can train it normally using:

for i in range(10):
    opt.zero_grad()
    out = net(features)
    loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
    loss.backward()
    opt.step()

However, I'm interested in updating the weights of each layer after each example in the batch. That is, updating the actual weight values by some amount that will be different for each layer.

I can print the parameters of each layer with:

for i in range(1):
    opt.zero_grad()
    out = net(features)
    print(parameters_to_vector(net.fc1.parameters()))
    print(parameters_to_vector(net.fc2.parameters()))
    print(parameters_to_vector(net.fc3.parameters()))
    loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
    loss.backward()
    opt.step()

How can I change the values of the weights before the backprop without affecting the gradient?

Say that I want the layers weights' to be updated according to the following functions:

def first_layer_update(weight):
    return weight   1e-3*weight

def second_layer_update(weight):
    return 1e-2*weight

def third_layer_update(weight):
    return weight - 1e-1*weight

CodePudding user response：

- Using the `torch.no_grad` context manager.

This allows you to perform (in-place or out-of-place) operations on your tensors without Autograd keeping track of those changes. As @user3474165 explained:

def first_layer_update(weight):
    with torch.no_grad():
        return weight   1e-3*weight

def second_layer_update(weight):
    with torch_no_grad():
        return 1e-2*weight

def third_layer_update(weight):
    with torch.no_grad():
        return weight - 1e-1*weight

Or differently without altering your functions using the context manager when calling them:

with torch.no_grad():
    first_layer_update(net.fc1.weight)
    second_layer_update(net.fc2.weight)
    third_layer_update(net.fc3.weight)

- Using the `@torch.no_grad` decorator.

A variant is to use the @torch.no_grad decorator:

@torch.no_grad()
def first_layer_update(weight):
    return weight   1e-3*weight

@torch.no_grad():
def second_layer_update(weight):
    return 1e-2*weight

@torch.no_grad():
def third_layer_update(weight):
    return weight - 1e-1*weight

And call these with: first_layer_update(net.fc1.weight), second_layer_update(net.fc2.weight), etc...

- Mutating `torch.Tensor.data`.

An alternative to wrapping your operations with the torch.no_grad context is to mutate the weights using their data attribute. This means calling your functions with:

>>> first_layer_update(net.fc1.weight.data)
>>> second_layer_update(net.fc2.weight.data)
>>> third_layer_update(net.fc3.weight.data)

Which would mutate the weights (not the biases) of the three layers with their respective update policies.

In a nutshell, if you want to mutate all parameters of a nn.Module you can either do:

>>> with torch.no_grad():
...     update_policy(parameters_to_vector(net.layer.parameters()))

>>> update_policy(parameters_to_vector(net.layer.parameters().data))

CodePudding user response：

From the pytorch docs, you're basically on the right track. You can loop over all the parameters in each layer and then add to them directly

with torch.no_grad():
  for param in layer.parameters():
    param.weight  = 1e-3  # or whatever

- Using the torch.no_grad context manager.

- Using the @torch.no_grad decorator.

- Mutating torch.Tensor.data.

- Using the `torch.no_grad` context manager.

- Using the `@torch.no_grad` decorator.

- Mutating `torch.Tensor.data`.