I have been trying to build a simple neural network myself (3 layers) to predict the MNIST dataset. I referenced some codes online and wrote some parts my own, the code runs without any errors, but something is wrong with the learning process. It seems like the prediction result is all "random". Applying the learning process to the network and use the network to predict the same image always gives me different results every time. Could someone please give me some hints where I did wrong?
import pandas as pd
import numpy as np
from PIL import Image
import os
np.set_printoptions(formatter={'float_kind':'{:f}'.format})
def init_setup():
#three layers perception
w1=np.random.randn(10,784)-0.8
b1=np.random.rand(10,1)-0.8
#second layer
w2=np.random.randn(10,10)-0.8
b2=np.random.randn(10,1)-0.8
#third layer
w3=np.random.randn(10,10)-0.8
b3=np.random.randn(10,1)-0.8
return w1,b1,w2,b2,w3,b3
def activate(A):
# use ReLU function as the activation function
Z=np.maximum(0,A)
return Z
def softmax(Z):
return np.exp(Z)/np.sum(np.exp(Z))
def forward_propagation(A,w1,b1,w2,b2,w3,b3):
# input A :(784,1)-> A1: (10,1) ->A2: (10,1) -> prob: (10,1)
z1=w1@A b1
A1=activate(z1)
z2=w2@A1 b2
A2=activate(z2)
z3=w3@A2 b3
prob=softmax(z3)
return z1,A1,z2,A2,z3,prob
def one_hot(Y:np.ndarray)->np.ndarray:
one_hot=np.zeros((10, 1)).astype(int)
one_hot[Y]=1
return one_hot
def back_propagation(A,z1,A1:np.ndarray,z2,A2:np.ndarray,z3,prob,w1,w2:np.ndarray,w3,Y:np.ndarray,lr:float):
m=1/Y.size
dz3=prob-Y
# print('loss ', np.sum(dz3))
dw3=m*[email protected]
db3= dz3
dz2=ReLU_deriv(z2)*w3.T@dz3
dw2 = [email protected]
db2 = dz2
dz1=ReLU_deriv(z1)*w2.T@dz2
dw1 = [email protected]
db1 = dz1
return db1,dw1,dw2,db2,dw3,db3
def ReLU_deriv(Z):
Z[Z>0]=1
Z[Z<=0]=0
return Z
def step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3):
w1 = w1 - lr * dw1
b1 = b1 - lr * db1
w2 = w2 - lr * dw2
b2 = b2 - lr * db2
w3 = w3 - lr * dw3
b3 = b3 - lr * db3
return w1,b1,w2,b2,w3,b3
Put functions together
def learn():
lr=0.00002
w1,b1,w2,b2,w3,b3=init_setup()
# read the data from a csv file
df=pd.read_csv('data.csv')
# Shuffle the data
df = df.sample(frac=1).reset_index(drop=True)
for epoch in range(0,5):
lr=lr/10
for _,row in df.iterrows():
A=row.values[1:]
A=A.reshape(784,1)
Y=int(row.values[0])
Y=one_hot(Y)
z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
db1,dw1,dw2,db2,dw3,db3=back_propagation(A,z1,A1,z2,A2,z3,prob,w1,w2,w3,Y,lr)
w1,b1,w2,b2,w3,b3=step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3)
return w1,b1,w2,b2,w3,b3
optimize_params=learn()
w1,b1,w2,b2,w3,b3=optimize_params
img=Image.open(r'C:\Users\Desktop\MNIST - JPG - training\2\16.jpg')
A=np.asarray(img)
A=A.reshape(-1,1)
z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
print(prob)
print(np.argmax(prob))
Running the code three times the results are like
>>>[[0.020815] >>>[[0.025916] >>>[[0.161880]
[0.019490] [0.031197] [0.104364]
[0.113170] [0.006868] [0.093192]
[0.051033] [0.426709] [0.041726]
[0.107867] [0.043123] [0.062953]
[0.009533] [0.001528] [0.324685]
[0.148977] [0.080894] [0.102557]
[0.333544] [0.273520] [0.043415]
[0.147408] [0.049245] [0.009269]
[0.048163]] [0.060999]] [0.055960]]
>>>7 >>>3 >>>5
Running the same code three times, I'm having three largely different results. I know there are randomness in neural netowrk, but isn't the parameters about the same after learning process? Could anyone please give me some hints or suggestion where I did wrong in the learning process or what causes the randomness in the result?
CodePudding user response:
Given the code itself is correct, I would increase the learning rate and increase the number of epochs. You even decrease the learning rate every epoch (lr=lr/10)
. Feels like the model doesn't have the time to converge (to actually learn). For starters, I would fix the learning rate at 0.001 and increase the number of epochs to maybe 25? If your results get better you can start fiddling around.