I'm trying to create a 3 layer neural network, with one input layer, one hidden layer and one output layer. The input layer is represented by a (1, 785)
Numpy array, thinking that I'm classifying digits from 0 to 9 with the MNIST dataset. My forward propagation algorithm has all the dimensions of the arrays right, although, when I compute the derivative of the weights and biases of the network, the shapes of the arrays become different from the originals and, when I do the gradient descent to update the weights and biases, the operation is not possible because, according to the Numpy documentation, broadcasting is not possible when the shapes are not equal or one of them is equal to 1
Here's the calculation of the derivatives of the weights and biases on the backpropagation:
def backpropagation(self, x, y):
predicted_value = self.forward_propagation(x)
cost_value_derivative = self.loss_function(
predicted_value.T, self.expected_value(y), derivative=True
)
print(f"{'-*-'*15} PREDICTION {'-*-'*15}")
print(f"Predicted Value: {np.argmax(predicted_value)}")
print(f"Actual Value: {y}")
print(f"{'-*-'*15}{'-*-'*19}")
derivative_W2 = (cost_value_derivative*self.sigmoid(
self.output_layer_without_activity, derivative=True)
).dot(self.hidden_layer.T).T
print(f"Derivative_W2: {derivative_W2.shape}, weights_hidden_layer_to_output_layer: {self.weights_hidden_layer_to_output_layer.shape}")
assert derivative_W2.shape == self.weights_hidden_layer_to_output_layer.shape
derivative_b2 = (cost_value_derivative*(self.sigmoid(
self.output_layer_without_activity, derivative=True).T
)).T
print(f"Derivative_b2: {derivative_b2.shape}, bias_on_output_layer: {self.bias_on_output_layer.shape}")
assert derivative_b2.shape == self.bias_on_output_layer.shape
derivative_b1 = cost_value_derivative*self.sigmoid(
self.output_layer_without_activity.T, derivative=True
).dot(self.weights_hidden_layer_to_output_layer.T).dot(
self.sigmoid(self.hidden_layer_without_activity, derivative=True)
)
print(f"Derivative_b1: {derivative_b1.shape}, bias_on_hidden_layer: {self.bias_on_hidden_layer.shape}")
assert derivative_b1.shape == self.bias_on_hidden_layer.shape
derivative_W1 = cost_value_derivative*self.sigmoid(
self.output_layer_without_activity.T, derivative=True
).dot(self.weights_hidden_layer_to_output_layer.T).dot(self.sigmoid(
self.hidden_layer_without_activity, derivative=True)
).dot(x)
print(f"Derivative_W1: {derivative_W1.shape}, weights_input_layer_to_hidden_layer: {self.weights_input_layer_to_hidden_layer.shape}")
assert derivative_W1.shape == self.weights_input_layer_to_hidden_layer.shape
return derivative_W2, derivative_b2, derivative_W1, derivative_b1
And here is the forward propagation that I implemented:
def forward_propagation(self, x):
self.hidden_layer_without_activity = self.weights_input_layer_to_hidden_layer.T.dot(x.T) self.bias_on_hidden_layer
self.hidden_layer = self.sigmoid(
self.hidden_layer_without_activity
)
self.output_layer_without_activity = self.weights_hidden_layer_to_output_layer.T.dot(
self.hidden_layer
) self.bias_on_output_layer
self.output_layer = self.sigmoid(
self.output_layer_without_activity
)
return self.output_layer
The gradient descent update on the weights and biases, using the weights_hidden_layer_to_output_layer
variable as an example, is weights_on_hidden_layer_to_output_layer -= learning_rate*derivative_W2
, where derivative_W2
is the derivative of the loss function in relation to the weights_hidden_layer_to_output_layer
.
CodePudding user response:
Since you are not providing the definition of your function, it is hard to know where it went wrong. However, I usually use this code snippet to compute a NN with 1 hidden layer and all sigmoid activation. I hope it help you debug your code.
for epoch in range(epochs):
# forward propagation
Z1 = np.dot(W1, X) b1
A1 = sigmoid(Z1)
Z2 = np.dot(W2, A1) b2
A2 = Sigmoid(Z2)
# backward propagation
dZ2 = A2 - Y
dW2 = 1/m * np.dot(dZ2, A1.T)
db2 = 1/m * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
dW1 = 1/m * np.dot(dZ1, X.T)
db1 = 1/m * np.sum(dZ1, axis=1, keepdims=True)
# update parameters
W1 = W1 - alpha * dW1
b1 = b1 - alpha * db1
W2 = W2 - alpha * dW2
b2 = b2 - alpha * db2
print('W1: {}\n b1: {}\n W2{} b2{}'.format(W1, b1, W2, b2))