Which Bins are occupied in a 3D histogram in MatLab-CodePudding

I got 3D data, from which I need to calculate properties. To reduce computung I wanted to discretize the space and calculate the properties from the Bin instead of the individual data points and then reasign the propertie caclulated from the bin back to the datapoint. I further only want to calculate the Bins which have points within them. Since there is no 3D-binning function in MatLab, what i do is using histcounts over each dimension and then searching for the unique Bins that have been asigned to the data points.

a5pre=compositions(:,1);
a7pre=compositions(:,2);
a8pre=compositions(:,3);
%% BINNING

a5pre_edges=[0,linspace(0.005,0.995,19),1];
a5pre_val=(a5pre_edges(1:end-1)   a5pre_edges(2:end))/2;
a5pre_val(1)=0;
a5pre_val(end)=1;

a7pre_edges=[0,linspace(0.005,0.995,49),1];
a7pre_val=(a7pre_edges(1:end-1)   a7pre_edges(2:end))/2;
a7pre_val(1)=0;
a7pre_val(end)=1;

a8pre_edges=a7pre_edges;
a8pre_val=a7pre_val;

[~,~,bin1]=histcounts(a5pre,a5pre_edges); 
[~,~,bin2]=histcounts(a7pre,a7pre_edges); 
[~,~,bin3]=histcounts(a8pre,a8pre_edges); 

bins=[bin1,bin2,bin3];

[A,~,C]=unique(bins,'rows','stable');

a5pre=a5pre_val(A(:,1));
a7pre=a7pre_val(A(:,2));
a8pre=a8pre_val(A(:,3));

It seems like that the unique function is pretty time consuming, so I was wondering if there is a faster way to do it, knowing that the line only can contain integer or so... or a totaly different.

Best regards

CodePudding user response：

function [comps,C]=compo_binner(x,y,z,e1,e2,e3,v1,v2,v3)

C=NaN(length(x),1);
comps=NaN(length(x),3);
id=1;

for i=1:numel(x)
    B_temp(1,1)=v1(sum(x(i)>e1));
    B_temp(1,2)=v2(sum(y(i)>e2));
    B_temp(1,3)=v3(sum(z(i)>e3));

    C_id=sum(ismember(comps,B_temp),2)==3;
    if  sum(C_id)>0
        C(i)=find(C_id);
    else
        comps(id,:)=B_temp;
        id=id 1;
        C_id=sum(ismember(comps,B_temp),2)==3;
        C(i)=find(C_id>0);
    end

end

comps(any(isnan(comps), 2), :) = [];



end

But its way slower than the histcount, unique version. Cant avoid find-function, and thats a function you sure want to avoid in a loop when its about speed...

CodePudding user response：

If I understand correctly you want to compute a 3D histogram. If there's no built-in tool to compute one, it is simple to write one:

function [H, lindices] = histogram3d(data, n)
% histogram3d   3D histogram
%    H = histogram3d(data, n) computes a 3D histogram from (x,y,z) values
%    in  the Nx3 array `data`. `n` is the number of bins between 0 and 1.
%    It is assumed all values in `data` are between 0 and 1.
assert(size(data,2) == 3, 'data must be Nx3');
H = zeros(n, n, n);
indices = floor(data * n)   1;
indices(indices > n) = n;
lindices = sub2ind(size(H), indices(:,1), indices(:,2), indices(:,3));
for ii = 1:size(data,1)
   H(lindices(ii)) = H(lindices(ii))   1;
end
end

Now, given your compositions array, and binning each dimension into 20 bins, we get:

[H, indices] = histogram3d(compositions, 20);
idx = find(H);
[x,y,z] = ind2sub(size(H), idx);
reduced_compositions = ([x,y,z] - 0.5) / 20;

The bin centers for H are at ((1:20)-0.5)/20.

On my machine this runs in a fraction of a second for 5 million inputs points.

Now, for each composition(ii,:), you have a number indices(ii), which matches with another number idx[jj], corresponding to reduced_compositions(jj,:). One easy way to make the assignment of results is as follows:

H(H > 0) = 1:numel(idx);
indices = H(indices);

Now for each composition(ii,:), your closest match in the reduced set is reduced_compositions(indices(ii),:).