How to atualize and calculate the derivative of the weights and bias of a 3 layer neural network (wi-CodePudding

I'm trying to create a 3 layer neural network, with one input layer, one hidden layer and one output layer. The input layer is represented by a (1, 785) Numpy array, thinking that I'm classifying digits from 0 to 9 with the MNIST dataset. My forward propagation algorithm has all the dimensions of the arrays right, although, when I compute the derivative of the weights and biases of the network, the shapes of the arrays become different from the originals and, when I do the gradient descent to update the weights and biases, the operation is not possible because, according to the Numpy documentation, broadcasting is not possible when the shapes are not equal or one of them is equal to 1

Here's the calculation of the derivatives of the weights and biases on the backpropagation:

    def backpropagation(self, x, y):
        predicted_value = self.forward_propagation(x)
        cost_value_derivative = self.loss_function(
                predicted_value.T, self.expected_value(y), derivative=True
            )
        print(f"{'-*-'*15} PREDICTION {'-*-'*15}")
        print(f"Predicted Value: {np.argmax(predicted_value)}")
        print(f"Actual Value: {y}")
        print(f"{'-*-'*15}{'-*-'*19}")

        derivative_W2 = (cost_value_derivative*self.sigmoid(
            self.output_layer_without_activity, derivative=True)
        ).dot(self.hidden_layer.T).T

        print(f"Derivative_W2: {derivative_W2.shape}, weights_hidden_layer_to_output_layer: {self.weights_hidden_layer_to_output_layer.shape}")
        assert derivative_W2.shape == self.weights_hidden_layer_to_output_layer.shape

        derivative_b2 = (cost_value_derivative*(self.sigmoid(
                self.output_layer_without_activity, derivative=True).T
        )).T

        print(f"Derivative_b2: {derivative_b2.shape}, bias_on_output_layer: {self.bias_on_output_layer.shape}")
        assert derivative_b2.shape == self.bias_on_output_layer.shape

        derivative_b1 = cost_value_derivative*self.sigmoid(
            self.output_layer_without_activity.T, derivative=True
        ).dot(self.weights_hidden_layer_to_output_layer.T).dot(
            self.sigmoid(self.hidden_layer_without_activity, derivative=True)
        )
        print(f"Derivative_b1: {derivative_b1.shape}, bias_on_hidden_layer: {self.bias_on_hidden_layer.shape}")

        assert derivative_b1.shape == self.bias_on_hidden_layer.shape

        derivative_W1 = cost_value_derivative*self.sigmoid(
            self.output_layer_without_activity.T, derivative=True
        ).dot(self.weights_hidden_layer_to_output_layer.T).dot(self.sigmoid(
                self.hidden_layer_without_activity, derivative=True)
        ).dot(x)

        print(f"Derivative_W1: {derivative_W1.shape}, weights_input_layer_to_hidden_layer: {self.weights_input_layer_to_hidden_layer.shape}")
        assert derivative_W1.shape == self.weights_input_layer_to_hidden_layer.shape

        return derivative_W2, derivative_b2, derivative_W1, derivative_b1

And here is the forward propagation that I implemented:

    def forward_propagation(self, x):

        self.hidden_layer_without_activity = self.weights_input_layer_to_hidden_layer.T.dot(x.T)   self.bias_on_hidden_layer

        self.hidden_layer = self.sigmoid(
            self.hidden_layer_without_activity
        )

        self.output_layer_without_activity = self.weights_hidden_layer_to_output_layer.T.dot(
            self.hidden_layer
        )   self.bias_on_output_layer

        self.output_layer = self.sigmoid(
            self.output_layer_without_activity
        )

        return self.output_layer

The gradient descent update on the weights and biases, using the weights_hidden_layer_to_output_layer variable as an example, is weights_on_hidden_layer_to_output_layer -= learning_rate*derivative_W2, where derivative_W2 is the derivative of the loss function in relation to the weights_hidden_layer_to_output_layer.

CodePudding user response：

Since you are not providing the definition of your function, it is hard to know where it went wrong. However, I usually use this code snippet to compute a NN with 1 hidden layer and all sigmoid activation. I hope it help you debug your code.

for epoch in range(epochs):
    # forward propagation
    Z1 = np.dot(W1, X)   b1
    A1 = sigmoid(Z1)
    Z2 = np.dot(W2, A1)   b2
    A2 = Sigmoid(Z2)

    # backward propagation
    dZ2 = A2 - Y
    dW2 = 1/m * np.dot(dZ2, A1.T)
    db2 = 1/m * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
    dW1 = 1/m * np.dot(dZ1, X.T)
    db1 = 1/m * np.sum(dZ1, axis=1, keepdims=True)

    # update parameters
    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1
    W2 = W2 - alpha * dW2
    b2 = b2 - alpha * db2

print('W1: {}\n b1: {}\n W2{} b2{}'.format(W1, b1, W2, b2))