I found some interesting code on the internet but for some reason it isn't working I keep getting this error:
Loss at iteration 0: 0.3568797210347673
Traceback (most recent call last):
File "d:\test.py", line 102, in <module>
dW1, db1, dW2, db2 = backward(X, y, a1, predictions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\Neural Network\Neural Network\test.py", line 78, in backward
hidden_error = output_error.T.dot(W2) * sigmoid_derivative(a1)
^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (1,10) and (2,1) not aligned: 10 (dim 1) != 2 (dim 0)
Here's the code:
import sqlite3
import numpy as np
# Generate some training data
X = np.array([[i] for i in range(10)])
y = np.array([[i % 2] for i in range(10)])
# Create a connection to the database
conn = sqlite3.connect('nn.db')
# Create a table to store the training data
cursor = conn.cursor()
cursor.execute('CREATE TABLE IF NOT EXISTS training_data (input REAL, output REAL)')
# Save the training data to the database
cursor.executemany('INSERT INTO training_data VALUES (?, ?)', zip(X, y))
conn.commit()
# Define the neural network architecture
input_size = 1
hidden_size = 2
output_size = 1
# Initialize the weights and biases randomly
W1 = np.random.randn(input_size, hidden_size)
b1 = np.random.randn(hidden_size)
W2 = np.random.randn(hidden_size, output_size)
b2 = np.random.randn(output_size)
# Define the sigmoid activation function
def sigmoid(x):
return 1 / (1 np.exp(-x))
# Define the derivative of the sigmoid function
def sigmoid_derivative(x):
return x * (1 - x)
# Define the loss function
def loss(predictions, targets):
return np.mean((predictions - targets) ** 2)
# Define the forward pass of the neural network
def forward(X):
# Propagate the input through the first layer
z1 = X.dot(W1) b1
a1 = sigmoid(z1)
# Propagate the hidden layer output through the second layer
z2 = a1.dot(W2) b2
a2 = sigmoid(z2)
return a1, a2
# Define the backward pass of the neural network
def backward(X, y, a1, predictions):
# Compute the error in the output layer
output_error = y - predictions
# Compute the gradient of the loss with respect to the output layer weights and biases
dW2 = a1.T.dot(output_error * sigmoid_derivative(predictions))
db2 = np.sum(output_error * sigmoid_derivative(predictions), axis=0)
# Compute the error in the hidden layer
hidden_error = output_error.T.dot(W2) * sigmoid_derivative(a1)
# Compute the gradient of the loss with respect to the hidden layer weights and biases
dW1 = X.T.dot(hidden_error.T)
db1 = np.sum(hidden_error, axis=0)
return dW1, db1, dW2, db2
# Define the learning rate
learning_rate = 0.1
# Train the neural network
for i in range(1000):
# Perform the forward pass
a1, predictions = forward(X)
# Compute the loss
l = loss(predictions, y)
# Print the loss every 100 iterations
if i % 100 == 0:
print(f'Loss at iteration {i}: {l}')
# Perform the backward pass
dW1, db1, dW2, db2 = backward(X, y, a1, predictions)
# Update the weights and biases
W1 = learning_rate * dW1
b1 = learning_rate * db1
W2 = learning_rate * dW2
b2 = learning_rate * db2
# Close the connection to the database
conn.close()
# Test the neural network on a new input
test_input = np.array([[5]])
predictions = forward(test_input)[1]
print(f'Prediction for test input {test_input}: {predictions}')
How to fix this I tried using Transpose on basically everything but I just kept getting the same error worded differently, The code is supposed to train itself to check whether a number is even or odd and then save the seed used and the trained data in a sqlite database by the way.(atleast thats what the blog says)
thanks
CodePudding user response:
thank you for your question,
You had your matrix multiplication messed up a little but here is the correct code for each of these lines:
hidden_error = output_error.dot(W2.T) * sigmoid_derivative(a1)
and
dW1 = X.T.dot(hidden_error)
Additionally, I noticed here that you are adding your gradients to your weights and biases, when you should be subtracting them. (you want your NN to go in the opposite direction of the loss function gradients, which point 'up', because you are minimizing the loss function)
# You should change this to subtraction
W1 = learning_rate * dW1
b1 = learning_rate * db1
W2 = learning_rate * dW2
b2 = learning_rate * db2
Finally, your model has no errors, but does not converge. The problem that you chose is an interesting one, and the format of the input data is crucial; just inputting decimal numbers is insufficient (You will notice you have about 50% accuracy, meaning the model is guessing 0 or 1). Please visit this StackExchange post for more detail about how to get this NN to work. (probably using binary representation of numbers)