Training with threshold in PyTorch-CodePudding

I have a neural network, which produces a single value when excited with input. I need to use this value returned by the network to threshold another array. The result of this threshold operation is used to compute a loss function (the value of threshold is not known before hand and needs to be arrived at by training). Following is an MWE

import torch

x = torch.randn(10, 1)  # Say this is the output of the network (10 is my batch size)
data_array = torch.randn(10, 2)  # This is the data I need to threshold
ground_truth = torch.randn(10, 2)  # This is the ground truth
mse_loss = torch.nn.MSELoss()  # Loss function

# Threshold
thresholded_vals = data_array * (data_array >= x)  # Returns zero in all places where the value is less than the threshold, the value itself otherwise

# Compute loss and gradients
loss = mse_loss(thresholded_vals, ground_truth)
loss.backward()  # Throws error here

Since the operation of thresholding returns a tensor array that is devoid of any gradients the backward() operation throws error.

How does one train a network in such a case?

CodePudding user response：

Your threshold function is not differentiable in the threshold, therefore torch does not calculate the gradient for the threshold which is why your example is not working.

import torch

x = torch.randn(10, 1, requires_grad=True)  # Say this is the output of the network (10 is my batch size)
data_array = torch.randn(10, 2, requires_grad=True)  # This is the data I need to threshold
ground_truth = torch.randn(10, 2)  # This is the ground truth
mse_loss = torch.nn.MSELoss()  # Loss function

# Threshold
thresholded_vals = data_array * (data_array >= x)  # Returns zero in all places where the value is less than the threshold, the value itself otherwise

# Compute loss and gradients
loss = mse_loss(thresholded_vals, ground_truth)
loss.backward()  # Throws error here
print(x.grad)
print(data_array.grad)

Output:

None #<- for the threshold x
tensor([[ 0.1088, -0.0617],  #<- for the data_array
        [ 0.1011,  0.0000],
        [ 0.0000,  0.0000],
        [-0.0000, -0.0000],
        [ 0.2047,  0.0973],
        [-0.0000,  0.2197],
        [-0.0000,  0.0929],
        [ 0.1106,  0.2579],
        [ 0.0743,  0.0880],
        [ 0.0000,  0.1112]])