MNIST numpy neural network accuracy hovering at 10%-CodePudding

Hi I've been working on a neural network to tackle the MNIST dataset, but when I run the code the accuracy begins to increase but eventually results in 0.098 accuracy, I also encounter an overflow error in exp when calculating the SoftMax values. I have tried to debug my code but I don't understand where I'm going wrong. If anyone can point me in the right direction that would be great and if you can't find an error could you give me any tips on techniques to try to debug this. Thanks in advance.

import numpy as np
import pandas as pd
df = pd.read_csv('../input/digit-recognizer/train.csv')
data = np.array(df.values)
data = data.T
data
Y = data[0,:]
X = data[1:,:]
Y_train = Y[:41000]
X_train = X[:,:41000]
X_train = X_train/255
Y_val = Y[41000:]
X_val = X[:,41000:]
X_val = X_val/255
print(np.max(X_train))
class NeuralNetwork:
    def __init__(self, n_in, n_out):
        self.w1, self.b1 = self.Generate_Weights_Biases(10,784)
        self.w2, self.b2 = self.Generate_Weights_Biases(10,10)
    def Generate_Weights_Biases(self, n_in, n_out):
        weights = 0.01*np.random.randn(n_in, n_out)
        biases = np.zeros((n_in,1))
        return weights, biases
    def forward(self, X):
        self.Z1 = self.w1.dot(X)   self.b1
        self.a1 = self.ReLu(self.Z1)
        self.z2 = self.w2.dot(self.a1)   self.b2
        y_pred = self.Softmax(self.z2)
        return y_pred
    def ReLu(self, Z):
        return np.maximum(Z,0)
    def Softmax(self, Z):
        #exponentials = np.exp(Z)
        #sumexp = np.sum(np.exp(Z), axis=0) 
        #print(Z)
        return np.exp(Z)/np.sum(np.exp(Z))
        
    def ReLu_Derv(self, x):
        return np.greaterthan(x, 0).astype(int)
    def One_hot_encoding(self, Y):
        one_hot = np.zeros((Y.size, 10))
        rows = np.arange(Y.size)
        one_hot[rows, Y] = 1
        one_hot = one_hot.T
        return one_hot
    def Get_predictions(self, y_pred):
        return np.argmax(y_pred, 0)
    def accuracy(self, pred, Y):
        return np.sum(pred == Y)/Y.size
    def BackPropagation(self, X, Y, y_pred, lr=0.01):
        m = Y.size
        one_hot_y = self.One_hot_encoding(Y)
        e2 = y_pred - one_hot_y
        derW2 = (1/m)* e2.dot(self.a1.T)
        derB2 =(1/m) * e2
        #derB2 = derB2.reshape(10,1)
        e1 = self.w2.T.dot(e2) * self.ReLu(self.a1)
        derW1 = (1/m) * e1.dot(X.T)
        derB1 = (1/m) * e1
        #derB1 = derB1.reshape(10,1)
        self.w1 = self.w1 - lr*derW1
        self.b1 = self.b1 - lr*np.sum(derB1, axis=1, keepdims=True)
        self.w2 = self.w2 - lr*derW2
        self.b2 = self.b2 - lr*np.sum(derB2, axis=1, keepdims=True)
    def train(self, X, Y, epochs = 1000):
        for i in range(epochs):
            y_pred = self.forward(X)
            predict = self.Get_predictions(y_pred)
            accuracy = self.accuracy(predict, Y)
            print(accuracy)
            self.BackPropagation(X, Y, y_pred)
        return self.w1, self.b1, self.w2, self.b2
    
NN = NeuralNetwork(X_train, Y_train)
w1,b1,w2,b2 = NN.train(X_train,Y_train)

CodePudding user response：

I found the following errors:

Your softmax implementation doesn't work because of terrific numeric errors you get exponentiating potentially large numbers to obtain something between 0 and 1. And besides, you forgot to specify the summation axis in the denominator. Here is a working implementation:

    def Softmax(self, Z):
        e = np.exp(Z - Z.max(axis=0, keepdims=True))
        return e/e.sum(axis=0, keepdims=True)

(Here and below I skip coding-style remarks that are not essential in this context. Like that this should be a class method or a stand-alone function etc.)

Your ReLu derivative implementation doesn't work for me at all. May be I have a different numpy version. This one works:

    def ReLu_Derv(self, x):
        return (x > 0).astype(int)

You need to actually use this implementation in BackPropagation:

        e1 = self.w2.T.dot(e2) * self.ReLu_Derv(self.a1)

With these amendments, I managed to achieve 91.0% accuracy after 100 iteration with LR=0.1. I loaded MNIST from Keras with this code:

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
X = train_images.reshape(-1, 28*28).T
Y = train_labels