Create custom connection/ non-fully connected layers in Pytorch-CodePudding

As shown in the figure, it is a 3 layer with NN, namely input layer, hidden layer and output layer. I want to design the NN(in PyTorch, just the arch) where the input to hidden layer is fully-connected. However, from hidden layer to output, the first two neurons of the hidden layer should be connected to first neuron of the output layer, second two should be connected to the second in the output layer and so on. How shall this should be designed ?

from torch import nn
layer1 = nn.Linear(input_size, hidden_size)
layer2 = ??????

CodePudding user response：

As @Jan said here, you can overload nn.Linear and provide a point-wise mask to mask the interaction you want to avoid having. Remember that a fully connected layer is merely a matrix multiplication with an optional additive bias.

Looking at its source code, we can do:

class MaskedLinear(nn.Linear):
    def __init__(self, *args, mask, **kwargs):
        super().__init__(*args, **kwargs)
        self.mask = mask

    def forward(self, input):
        return F.linear(input*self.mask, self.weight, self.bias)

Having F defined as torch.nn.functional

Considering the constraint you have given to the second layer:

the first two neurons of the hidden layer should be connected to the first neuron of the output layer

It seems you are looking for this pattern:

tensor([[1., 0., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.]])

Which can be obtained using torch.block_diag:

mask = torch.block_diag(*[torch.ones(2,1),]*output_size)

Having this, you can define your network as:

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    MaskedLinear(hidden_size, output_size, mask))

If you feel like it you can even implement it inside the custom layer:

class LocalLinear(nn.Linear):
    def __init__(self, kernel_size=2, *args, mask, **kwargs):
        super().__init__(*args, **kwargs)

        assert self.in_features == kernel_size*self.out_features
        self.mask = torch.block_diag(*[torch.ones(kernel_size,1),]*self.out_features)

def forward(self, input):
    return F.linear(input*self.mask, self.weight, self.bias)

And defining it like so:

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    LocalLinear(hidden_size, output_size))

CodePudding user response：

Instead of using nn.Linear directly, create a weights tensor weight and a mask tensor mask that masks those weights that you do not intend to use. Then you use torch.nn.functional.linear(input, weight * mask) (https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html) to forward the second layer. Note that this is implemented in your torch.nn.Module's forward function. The weight needs to be registered as a parameter to your nn.Module so that it's recognized by nn.Module.parameters(). See https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_parameter.