simple Neural Network gives random prediction result-CodePudding

I have been trying to build a simple neural network myself (3 layers) to predict the MNIST dataset. I referenced some codes online and wrote some parts my own, the code runs without any errors, but something is wrong with the learning process. It seems like the prediction result is all "random". Applying the learning process to the network and use the network to predict the same image always gives me different results every time. Could someone please give me some hints where I did wrong?

import pandas as pd
import numpy as np
from PIL import Image
import os
np.set_printoptions(formatter={'float_kind':'{:f}'.format})
def init_setup():
    #three layers perception
    w1=np.random.randn(10,784)-0.8
    b1=np.random.rand(10,1)-0.8
    #second layer
    w2=np.random.randn(10,10)-0.8
    b2=np.random.randn(10,1)-0.8
    #third layer
    w3=np.random.randn(10,10)-0.8
    b3=np.random.randn(10,1)-0.8
    return w1,b1,w2,b2,w3,b3
def activate(A):
    # use ReLU function as the activation function
    Z=np.maximum(0,A)
    return Z
def softmax(Z):
    return np.exp(Z)/np.sum(np.exp(Z))

def forward_propagation(A,w1,b1,w2,b2,w3,b3):
    # input A :(784,1)-> A1: (10,1) ->A2: (10,1) -> prob: (10,1)
    z1=w1@A b1
    A1=activate(z1)
    z2=w2@A1 b2
    A2=activate(z2)
    z3=w3@A2 b3
    prob=softmax(z3)

    return z1,A1,z2,A2,z3,prob
def one_hot(Y:np.ndarray)->np.ndarray:

    one_hot=np.zeros((10, 1)).astype(int)
    
    one_hot[Y]=1
    return one_hot

def back_propagation(A,z1,A1:np.ndarray,z2,A2:np.ndarray,z3,prob,w1,w2:np.ndarray,w3,Y:np.ndarray,lr:float):

    m=1/Y.size

    dz3=prob-Y 
    # print('loss ', np.sum(dz3))
    dw3=m*[email protected]

    db3=  dz3
    dz2=ReLU_deriv(z2)*w3.T@dz3
    dw2 =  [email protected]
    db2 =  dz2
    dz1=ReLU_deriv(z1)*w2.T@dz2
    dw1 = [email protected]
    db1 =  dz1
    return db1,dw1,dw2,db2,dw3,db3
def ReLU_deriv(Z):
    Z[Z>0]=1
    Z[Z<=0]=0
    return Z 
def step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3):
    w1 = w1 - lr * dw1
    b1 = b1 - lr * db1    
    w2 = w2 - lr * dw2  
    b2 = b2 - lr * db2
    w3 = w3 - lr * dw3 
    b3 = b3 - lr * db3       
    return w1,b1,w2,b2,w3,b3

Put functions together

def learn():
    lr=0.00002
    w1,b1,w2,b2,w3,b3=init_setup()
    # read the data from a csv file
    df=pd.read_csv('data.csv')
    # Shuffle the data
    df = df.sample(frac=1).reset_index(drop=True)
    for epoch in range(0,5):
        lr=lr/10
        for _,row in df.iterrows():
            A=row.values[1:]
            A=A.reshape(784,1)
            Y=int(row.values[0])
            Y=one_hot(Y)
            z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
            db1,dw1,dw2,db2,dw3,db3=back_propagation(A,z1,A1,z2,A2,z3,prob,w1,w2,w3,Y,lr)
            w1,b1,w2,b2,w3,b3=step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3)
    return  w1,b1,w2,b2,w3,b3

optimize_params=learn()
w1,b1,w2,b2,w3,b3=optimize_params
img=Image.open(r'C:\Users\Desktop\MNIST - JPG - training\2\16.jpg')
A=np.asarray(img)
A=A.reshape(-1,1)
z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
print(prob)
print(np.argmax(prob))

Running the code three times the results are like

>>>[[0.020815] >>>[[0.025916] >>>[[0.161880]
    [0.019490]     [0.031197]     [0.104364]
    [0.113170]     [0.006868]     [0.093192]
    [0.051033]     [0.426709]     [0.041726]
    [0.107867]     [0.043123]     [0.062953]
    [0.009533]     [0.001528]     [0.324685]
    [0.148977]     [0.080894]     [0.102557]
    [0.333544]     [0.273520]     [0.043415]
    [0.147408]     [0.049245]     [0.009269]
    [0.048163]]    [0.060999]]    [0.055960]]
>>>7           >>>3           >>>5

Running the same code three times, I'm having three largely different results. I know there are randomness in neural netowrk, but isn't the parameters about the same after learning process? Could anyone please give me some hints or suggestion where I did wrong in the learning process or what causes the randomness in the result?

CodePudding user response：

Given the code itself is correct, I would increase the learning rate and increase the number of epochs. You even decrease the learning rate every epoch (lr=lr/10). Feels like the model doesn't have the time to converge (to actually learn). For starters, I would fix the learning rate at 0.001 and increase the number of epochs to maybe 25? If your results get better you can start fiddling around.