Self-built Neural Network-CodePudding

I have been trying to build a simple neural network myself (3 layers) to predict the MNIST dataset. I referenced some codes online and wrote some parts my own, the code runs without any errors, but something is wrong with the learning process. The trained network always gives me wrong predictions, one or two classes always have very high probability no matter what I pass in as input. I tried to figure out the problem but not making any progress in a few days. Could anyone give me some hints where I did wrong?

import numpy as np
from PIL import Image
import os
np.set_printoptions(formatter={'float_kind':'{:f}'.format})
def init_setup():
    #three layers perception
    w1=np.random.randn(10,784)-0.8
    b1=np.random.rand(10,1)-0.8
    #second layer
    w2=np.random.randn(10,10)-0.8
    b2=np.random.randn(10,1)-0.8
    #third layer
    w3=np.random.randn(10,10)-0.8
    b3=np.random.randn(10,1)-0.8
    return w1,b1,w2,b2,w3,b3
def activate(A):
    # use ReLU function as the activation function
    Z=np.maximum(0,A)
    return Z
def softmax(Z):
    return np.exp(Z)/np.sum(np.exp(Z))

def forward_propagation(A,w1,b1,w2,b2,w3,b3):
    # input A :(784,1)-> A1: (10,1) ->A2: (10,1) -> prob: (10,1)
    z1=w1@A b1
    A1=activate(z1)
    z2=w2@A1 b2
    A2=activate(z2)
    z3=w3@A2 b3
    prob=softmax(z3)

    return z1,A1,z2,A2,z3,prob
def one_hot(Y:np.ndarray)->np.ndarray:

    one_hot=np.zeros((10, 1)).astype(int)
    
    one_hot[Y]=1
    return one_hot

def back_propagation(A,z1,A1:np.ndarray,z2,A2:np.ndarray,z3,prob,w1,w2:np.ndarray,w3,Y:np.ndarray,lr:float):

    m=1/Y.size

    dz3=prob-Y 

    dw3=m*[email protected]

    db3= dz3
    dz2=ReLU_deriv(z2)*w3.T@dz3
    dw2 =  [email protected]
    db2 =  dz2
    dz1=ReLU_deriv(z1)*w2.T@dz2
    dw1 = [email protected]
    db1 =  dz1
    return db1,dw1,dw2,db2,dw3,db3
def ReLU_deriv(Z):
    Z[Z>0]=1
    Z[Z<=0]=0
    return Z 
def step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3):
    w1 = w1 - lr * dw1

    b1 = b1 - lr * db1    
    w2 = w2 - lr * dw2  
    b2 = b2 - lr * db2
    w3 = w3 - lr * dw3 
    b3 = b3 - lr * db3       
    return w1,b1,w2,b2,w3,b3

put functions together

def learn():
    lr=0.5
    dir=r'C:\Users\Desktop\MNIST - JPG - training\{}'
    w1,b1,w2,b2,w3,b3=init_setup()
    for e in range(10):
        if e%3 == 0:
            lr=lr/10
        for num in range(10):
            Y=one_hot(num)
            # print(Y)
            path=dir.format(str(num))
            for i in os.listdir(path):
                img=Image.open(path '\\' i)
                A=np.asarray(img)
                A=A.reshape(-1,1) 
                z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
                # print('loss=' str(np.sum(np.abs(Y-prob))))
                db1,dw1,dw2,db2,dw3,db3=back_propagation(A,z1,A1,z2,A2,z3,prob,w1,w2,w3,Y,lr)
                w1,b1,w2,b2,w3,b3=step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3)
    return  w1,b1,w2,b2,w3,b3
optimize_params=learn()
w1,b1,w2,b2,w3,b3=optimize_params
img=Image.open(r'C:\Users\Desktop\MNIST - JPG - training\2\5.jpg')
A=np.asarray(img)
A=A.reshape(-1,1)
z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
print(prob)
print(np.argmax(prob))

After running the learn function, the network gave me something like this

>>>[[0.040939]
    [0.048695]
    [0.048555]
    [0.054962]
    [0.060614]
    [0.066957]
    [0.086470]
    [0.117370]
    [0.163163]
    [0.312274]]
>>>9

The result is obviously wrong, the true label should really be 2, but as we see on the prob, class 2 has an extremely low value, thus I believe there must be something wrong in the learning process. But I have no clue at all, can someone please give me some hints?

CodePudding user response：

Your current code only trains on labels 0 and 1

for num in range(2):

So there is no way for your model to "know" about any other labels.

Now your model is trained in a very ordered way, and thus has a bias towards last classes. As these are the last ones it saw during training. You should shuffle your training data in each epoch and not feed the network class-wise