Optimizing matrix calculations in for loops in Octave-CodePudding

I imported code from Matlab to Octave and the speed of certain functions seems to have dropped. I looked into vectorization and could not come up with a solution with my limited knowledge. What i want to ask, is there a way to speed this up?

    n = 181;
    N = 250;
    for i=1:n    
        for j=1:n
            par=0;
            for k=1:N;
               par=par log2(1 (10.^(matrix1(j,i,matrix2(j,i))./10)./(matrix3(j,i).*double1 double2)));
            end
            resultingMatrix(j,i)=2.^((1/N).*par)-1;
        end
    end

Where dimensions are:

    matrix1 = 181x181x2,  
    matrix2 = 181x181 -->  containing values either 1 or 2 only,  
    matrix3 = 181x181,  
    double1, double2 = just doubles

CodePudding user response：

Here's my testing code, I've completed your code by making some random matrices:

n = 181;
N = 250;
matrix1 = rand(n,n,2);
matrix2 = randi(2,n,n);
matrix3 = rand(n,n);
double1 = 1;
double2 = 1;
tic
for i=1:n
   for j=1:n
      par=0;
      for k=1:N
         par=par log2(1 (10.^(matrix1(j,i,matrix2(j,i))./10)./(matrix3(j,i).*double1 double2)));
      end
      resultingMatrix(j,i)=2.^((1/N).*par)-1;
   end
end
toc

Note that the code inside the loop over k doesn't use k. This makes the loop superfluous. We can easily remove it. The loop does the same computation 250 times, adds up the results, and divides by 250, yielding the value of one of the repeated computations.

Another important thing to do is preallocate resultingMatrix, to avoid it growing with every loop iteration.

This is the resulting code:

tic
resultingMatrix2 = zeros(n,n);
for i=1:n
   for j=1:n
      par=log2(1 (10.^(matrix1(j,i,matrix2(j,i))./10)./(matrix3(j,i).*double1 double2)));
      resultingMatrix2(j,i)=2.^par-1;
   end
end
toc
max(abs((resultingMatrix(:)-resultingMatrix2(:))./resultingMatrix(:)))

The last line computes the maximum relative difference. It is 9.9424e-15 in my version of Octave. It will differ depending on the version, the system, and more. This error is the floating-point rounding error. Note that the original code, adding the same value 250 times, and then dividing it by 250, will produce a larger rounding error than the modified code. For example,

x = pi;
t = 0;
for i = 1:N
   t = t   x;
end;
t = t / N;
t-x

gives -8.4377e-15, a similar rounding error to what we saw above.

The original code took 81.5 s, the modified code takes only 0.4 s. This is not a gain of vectorization, it is just a gain of preallocation and not needlessly repeating the same computation over and over again.

Next, we can remove the other two loops by vectorizing the operations. The difficult bit here is matrix1(j,i,matrix2(j,i)). We can produce each of the n*n linear indices with (1:n*n).' (matrix2(:)-1)*(n*n). This is not trivial, I suggest you think about how this computation works. You need to know that linear indices count, starting at 1 for the top-left array element, first down, then right, then along the 3rd dimension. So 1:n*n is simply the linear indices for each of the elements of a 2D array, in order. To each of these we add n*n if we need to access the 2nd element along the 3rd dimension.

We now have the code

tic
index = reshape((1:n*n).'   (matrix2(:)-1)*(n*n), n, n);
par = log2(1 (10.^(matrix1(index)./10)./(matrix3.*double1 double2)));
resultingMatrix3 = 2.^par-1;
toc
max(abs((resultingMatrix(:)-resultingMatrix3(:))./resultingMatrix(:)))

This code produces the exact same result as my previous version, and runs in only 0.013 s, 30 times faster than the non-vectorized code, and 6000 times faster than the original code.