Home > database >  How to plot and estimate empirical CDF and cdf in matlab
How to plot and estimate empirical CDF and cdf in matlab

Time:01-06

the question has already been raised several times, but mine differs a little from those previously voiced. There is a table (x value and relative frequencies).

x 150 250 350 450 550 650 750
y 1 2 8 30 18 16 5

I don’t really understand the meaning of the function [f,x] = ecdf(y) built into matlab, since I estimate and plot an empirical distribution function,

enter image description here

however, it is clearly not correct, if you build a histogram based on the selected data (x and y), then the resulting ECDF does not describe the correctly chosen distribution.

enter image description here

Therefore, such a question arose: how to construct correctly ECDF function from the table (empirical distribution function for x and having an array of relative frequencies)for the distribution function and from it directly estimate and plot cumulative distribution function?

My code for plot hist and ECDF:

%% data
y = [1; 2; 8; 30; 18; 16; 5];
x = [150; 250; 350; 450; 550; 650; 750];
%% hist and polygon
figure(1)
bar(x,y,'LineWidth',1,...
    'FaceColor',[0.0745098039215686 0.623529411764706 1],...
    'EdgeColor',[0.149019607843137 0.149019607843137 0.149019607843137],...
    'BarWidth',1,...
    'BarLayout','stacked');
hold on
plot(x,y,'-o','Color','red','LineWidth',1)
hold off

%% ecdf
[ff,x] = ecdf(y);
x_e = [0;x];
figure(2)
stairs(x_e,ff,'Marker','o','LineWidth',1,'Color',[0.0745098039215686 0.623529411764706 1]);
set(gca,'GridAlpha',0.25,'GridLineStyle','--','MinorGridLineStyle','--',...
    'XGrid','on','XMinorGrid','on','YGrid','on');
xlim([0 780]);

CodePudding user response:

You should not use the ecdf function, because it uses the data values as input. Your inputs, on the other hand, seem to be the population values and their absolute frequencies. So you only need to normalize the frequencies to make them relative and then compute their cumulative sum.

When plotting, I suggest you include some initial and final population values with respective normalized frequencies 0 and 1 for a clearer graph.

x = [150; 250; 350; 450; 550; 650; 750];
y = [1; 2; 8; 30; 18; 16; 5]; % example data
cdf = cumsum(y./sum(y)); % normalize, then compute cumulative sum
stairs([100; x; 900], [0; cdf; 1], 'linewidth', .8), grid on % note two extra values

enter image description here

  • Related