Cost function computation for neural network-CodePudding

I am in week 5 of Andrew Ng's Machine Learning Course on Coursera. I am working through the programming assignment in Matlab for this week, and I chose to use a for loop implementation to compute the cost J. Here is my function.

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size   1)), ...
                 hidden_layer_size, (input_layer_size   1));

Theta2 = reshape(nn_params((1   (hidden_layer_size * (input_layer_size   1))):end), ...
                 num_labels, (hidden_layer_size   1));


% Setup some useful variables
m = size(X, 1);

% add bias to X to create 5000x401 matrix
X = [ones(m, 1) X];
         
% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));


% initialize summing terms used in cost expression
sum_i = 0.0;

% loop through each sample to calculate the cost
for i = 1:m

    % logical vector output for 1 example
    y_i = zeros(num_labels, 1);
    class = y(m);
    y_i(class) = 1;
    
    % first layer just equals features in one example 1x401
    a1 = X(i, :);
    
    % compute z2, a 25x1 vector
    z2 = Theta1*a1';
    
    % compute activation of z2
    a2 = sigmoid(z2);
    
    % add bias to a2 to create a 26x1 vector
    a2 = [1; a2];
    
    % compute z3, a 10x1 vector
    z3 = Theta2*a2;
    
    %compute activation of z3. returns output vector of size 10x1
    a3 = sigmoid(z3);
    h = a3;
    
    % loop through each class k to sum cost over each class
    for k = 1:num_labels        
        
        % sum_i returns cost summed over each class
        sum_i = sum_i   ((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k))));
        
    end
        
end

J = sum_i/m;

I understand that a vectorized implementaion of this would be easier, but I do not understand why this implementation is wrong. When num_labels = 10, this function outputs J = 8.47, but the expected cost is 0.287629. I computed J from this formula. Am I misunderstanding the computation? My understanding is that each training example's cost for each of the 10 classes are computed then the cost for all 10 classes for each example are summed together. Is that incorrect? Or did I not implement this in my code properly? Thanks in advance.

CodePudding user response：

the problem is in the formula you are implementing

this expression ((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k)))); represent the loss in case in binary classification because you were simply have 2 classes so either

y_i is 0 so (1 - yi) = 1
y_i is 1 so (1 - yi) = 0

so you basically take into account only the target class probability.

how ever in case of 10 labels as you mention (y_i) or (1 - yi) not necessary of one of them to be 0 and the other to be 1

you should correct the loss function implementation so that you only take into account the probability of the target class only not all other classes.

CodePudding user response：

My problem is with indexing. Rather than saying class = y(m) it should be class = y(i) since i is the index and m is 5000 from the number of rows in the training data.