Home > Back-end >  How is the resultant likelihood matrix from a multivariate Gaussian distribution used in Bayesian cl
How is the resultant likelihood matrix from a multivariate Gaussian distribution used in Bayesian cl

Time:12-12

I am attempting to make a Bayesian classification algorithm using multivariate Gaussian distribution, but I can't figure out how to extract the probabilities from the resultant matrix from this equation:

enter image description here

So far I have managed to classify a set of given two features into either of two classes using univariate Gaussian distribution, however I would like to have six features, and so using univariate would become quite a lot of code. To achieve this I used the following equations:

enter image description here

And my MATLAB code:

grass_trained_data = [300, 5; 278, 6; 320, 4];
water_trained_data = [320, 12; 329, 8; 305, 9];

test_data = [300, 5]; 
td_x1 = test_data(1);
td_x2 = test_data(2);

% P(grass)
p_grass = 3/6;

% P(water)
p_water = 3/6;

% Calculate P(x | grass)
grass_td_mean_vector = mean(grass_trained_data);
grass_td_mean_x1 = grass_td_mean_vector(1);  % the mean of x1 in the trained data
grass_td_mean_x2 = grass_td_mean_vector(2);  % the mean of x2 in the trained data

grass_td_variance = var(grass_trained_data);  % the variance of x1 and x2
grass_td_variance_x1 = grass_td_variance(1);
grass_td_variance_x2 = grass_td_variance(2);

p_x1_given_grass = (2 * pi * grass_td_variance_x1).^(-1/2) * exp(-1/2 * ((td_x1 - grass_td_mean_x1).^2) / grass_td_variance_x1);
p_x2_given_grass = (2 * pi * grass_td_variance_x2).^(-1/2) * exp(-1/2 * ((td_x2 - grass_td_mean_x2).^2) / grass_td_variance_x2);

p_x_given_grass = p_x1_given_grass * p_x2_given_grass;

% Calculate P(x | water)
water_td_mean_vector = mean(water_trained_data);
water_td_mean_x1 = water_td_mean_vector(1);  % the mean of x1 in the trained data
water_td_mean_x2 = water_td_mean_vector(2);  % the mean of x2 in the trained data

water_td_variance = var(water_trained_data);  % the variance of x1 and x2
water_td_variance_x1 = water_td_variance(1);
water_td_variance_x2 = water_td_variance(2);

p_x1_given_water = (2 * pi * water_td_variance_x1).^(-1/2) * exp(-1/2 * ((td_x1 - water_td_mean_x1).^2) / water_td_variance_x1);
p_x2_given_water = (2 * pi * water_td_variance_x2).^(-1/2) * exp(-1/2 * ((td_x2 - water_td_mean_x2).^2) / water_td_variance_x2);

p_x_given_water = p_x1_given_water * p_x2_given_water;

% Now calculate the probability of the test data belonging to each class
p_grass_given_x = (p_x_given_grass * p_grass) / (p_x_given_grass * p_grass   p_x_given_water * p_water);
p_water_given_x = (p_x_given_water * p_water) / (p_x_given_grass * p_grass   p_x_given_water * p_water);

This gives P(X | grass) as a probability from 0 to 1, however, in my code for my multivariate attempt:

grass_trained_data = [300, 5; 278, 6; 320, 4];
water_trained_data = [320, 12; 329, 8; 305, 9];

test_data = [300, 5];

% P(grass)
p_grass = 3/6;

% P(water)
p_water = 3/6;

% Calculate P(x | grass)
grass_td_mean_vector = mean(grass_trained_data);
grass_td_cov_matrix = cov(grass_trained_data);

p_x_given_grass = (2 .* pi).^(-1/2) .* det(grass_td_cov_matrix).^(-1/2) .* exp((-1/2) .* transpose(test_data - grass_td_mean_vector) .* inv(grass_td_cov_matrix) .* (test_data - grass_td_mean_vector))

This gives P(X | grass) as a 2 x 2 matrix. What is the meaning of each of these values, and how can I use them?

CodePudding user response:

Solved; the issue was that the vectors expected by the multivariate equation were column vectors, so I have changed the mean vectors and my test data vector into column vectors, which also allowed me to use * rather than element-wise .* multiplication. I now get a single value probability from 0 to 1.

  • Related