Problem
In MATLAB I have 2 1-D array of the same size with integer values. Considering that the values at a same index are a pair, I want to count how many pairs of each I have and store that in a matrix to later print it a a heatmap.
Here is a small example of what I want to archive:
v1 = [1, 0, 1, 0, 1];
v2 = [1, 1, 0, 1, 1];
% Magic happens here
heatMatrix = [0, 2;
1, 2];
The result is as follow because we have:
- 0 time the pair v1=0, v2=0 (heatMatrix[1, 1])
- 2 times the pair v1=0, v2=1 (indices 2 and 4, heatMatrix[1, 2])
- 1 time the pair v1=1, v2=0 (index 3, heatMatrix[2, 1])
- 2 time the pair v1=1, v2=1 (indices 1 and 5, heatMatrix[2, 2])
My code
I have done that using MATLAB loops which is slower than reasonable especially for long vectors:
% Alocating random vectors
rng(0);
maxVal = 255;
nbElem = 1e6;
v1 = randi(nbElem, 1);
v2 = randi(nbElem, 1);
heatMatrix = zeros(maxVal, maxVal);
% Creating heat matrix
tic();
for ii = 1:maxVal
for jj = 1:maxVal
heatMatrix(ii, jj) = sum((v1 == i) & (v2 == j));
end
end
toc();
Performance issues
I have to deal with long vectors, in the realm of 1e7 elements. I am working with an Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz and have 7.5 GB of RAM. And my performances are as follow:
Question
I am looking for a way to speed up the execution of this task. I'm mostly looking for a way to vector or perform the task in a smarter way (maybe there is a MATLAB function I'm not aware of).
CodePudding user response:
A simple way to do this is to:
Realize the value pairs in v1 and v2 are the indexes in your heatMatrix and transform them to linear indexes
Use
unique
to get the unique indexes, as well as the corresponding indexes in the input linear indexes vectorCalculate the repetitions of each index in
ix
withaccumarray
Use the
unique_ind
indexes to fill the matrix
%
% Alocating random vectors
rng(0);
maxVal = 255;
nbElem = 1e6;
v1 = randi(maxVal,nbElem,1);
v2 = randi(maxVal,nbElem,1);
heatMatrix = zeros(maxVal, maxVal);
heatMatrix2 = zeros(maxVal, maxVal);
% Creating heat matrix
tic();
for ii = 1:maxVal
for jj = 1:maxVal
heatMatrix(ii, jj) = sum((v1 == ii) & (v2 == jj));
end
end
toc();
% faster
tic();
ind = sub2ind([maxVal maxVal],v1,v2);
[unique_ind,~,ix] = unique(ind);
C = accumarray(ix,1).';
heatMatrix2(unique_ind) = C;
toc();
isequal(heatmatrix,heatmatrix2)
%%%% Elapsed time is 24.850305 seconds.
%%%% Elapsed time is 0.049509 seconds.
%%%% 1
If, as I suspect, your values actually range from 0 to maxVal
:
% Alocating random vectors
rng(0);
maxVal = 255;
nbElem = 1e6;
v1 = randi(maxVal 1,nbElem,1)-1;
v2 = randi(maxVal 1,nbElem,1)-1;
heatMatrix = zeros(maxVal 1, maxVal 1);
heatMatrix2 = zeros(maxVal 1, maxVal 1);
% Creating heat matrix
tic();
for ii = 1:(maxVal 1)
for jj = 1:(maxVal 1)
heatMatrix(ii, jj) = sum((v1 == (ii-1)) & (v2 == (jj-1)));
end
end
toc();
% faster
tic();
ind = sub2ind([maxVal 1 maxVal 1],v1 1,v2 1);
[unique_ind,~,ix] = unique(ind);
C = accumarray(ix,1).';
heatMatrix2(unique_ind) = C;
toc();
isequal(heatMatrix,heatMatrix2)
%%%% Elapsed time is 24.748371 seconds.
%%%% Elapsed time is 0.055760 seconds.
%%%% 1
CodePudding user response:
Option 1: accumarray()
accumarray()
can handle a 2D index, so:
v1 = [1, 0, 1, 0, 1];
v2 = [1, 1, 0, 1, 1];
hm = accumarray([v1;v2].' 1,1)
% hm = [0, 2;
% 1, 2];
Just make sure that your index start with 1.
Option 2: sparse()
sparse()
can handle duplicate index, so:
v1 = [1, 0, 1, 0, 1];
v2 = [1, 1, 0, 1, 1];
hm = full(sparse(v1 1,v2 1,1))
% hm = [0, 2;
% 1, 2];
Both options are way faster than your solution and faster than the @BillBokeey's solution. But it looks like accumarray is a bit faster.