Home > other >  Is the timing of MATLAB reliable? If yes, can we reproduce the performance with julia, fortran, etc.
Is the timing of MATLAB reliable? If yes, can we reproduce the performance with julia, fortran, etc.

Time:12-30

Originally this is a problem coming up in mathematica.SE, but since multiple programming languages have involved in the discussion, I think it's better to rephrase it a bit and post it here.

In short, michalkvasnicka found that in the following MATLAB sample

s = 15000;
tic
% for-loop version
H = zeros(s,s);
for c = 1:s
    for r = 1:s
        H(r,c) = 1/(r c-1);
    end
end
toc
%Elapsed time is 1.359625 seconds.... For-loop 

tic;
% vectorized version
c = 1:s;
r = c';
HH=1./(r c-1);
toc
%Elapsed time is 0.047916 seconds.... Vectorized

isequal(H,HH)

the vectorized code piece is more than 25 times faster than the pure for-loop code piece. Though I don't have access to MATLAB so cannot test the sample myself, the timing 1.359625 seems to suggest it's tested on an average PC, just as mine.

But I cannot reproduce the timing with other languages like fortran or julia! (We know, both of them are famous for their performance of numeric calculation. Well, I admit I'm by no means an expert of fortran or julia. )

The followings are the samples I used for test. I'm using a laptop with i7-8565U CPU, Win 10.

fortran

fortran code is compiled with gfortran (TDM-GCC-10.3.0-2, with compile option -Ofast).

program tst
use, intrinsic :: iso_fortran_env
implicit none
integer,parameter::s=15000
integer::r,c
real(real64)::hmn(s,s)
do r=1,s
    do c=1, s
        hmn(r,c)=1._real64/(r   c - 1)
    end do 
end do
print *, hmn(s,s)
end program

compilation timing: 0.2057823 seconds
execution timing: 0.7179657 seconds

julia

Version of julia is 1.6.3.

@time (s=15000; Hmm=[1. /(r c-1) for r=1:s,c=1:s];)

Timing: 0.7945998 seconds

Here comes the question:

  1. Is the timing of MATLAB reliable?

  2. If the answer to 1st question is yes, then how can we reproduce the performance (for 2 GHz CPU, the timing should be around 0.05 seconds) with julia, fortran, or any other programming languages?

CodePudding user response:

tic/toc should be fine, but it looks like the timing is being skewed by memory pre-allocation.

I can reproduce similar timings to your MATLAB example, however

  • On first run (clear workspace)

    • Loop approach takes 2.08 sec
    • Vectorised approach takes 1.04 sec
    • Vectorisation saves 50% execution time
  • On second run (workspace not cleared)

    • Loop approach takes 2.55 sec
    • Vectorised approach takes 0.065 sec
    • Vectorisation "saves" 97.5% execution time

My guess would be that since the loop approach explicitly creates a new matrix via zeros, the memory is reallocated from scratch on every run and you don't see the speed improvement on subsequent runs.

However, when HH remains in memory and the HH=___ line outputs a matrix of the same size, I suspect MATLAB is doing some clever memory allocation to speed up the operation.

We can prove this theory with the following test:

Test Num  |  Workspace cleared  |    s    |  Loop (sec)  |  Vectorised (sec) 
    1     |       Yes           |  15000  |    2.10      |       1.41
    2     |        No           |  15000  |    2.73      |       0.07
    3     |        No           |  15000  |    2.50      |       0.07
    4     |        No           |  15001  |    2.74      |       1.73

See the variation between tests 2 and 3, this is why timeit would have been helpful for an average runtime (see footnote). The difference in output sizes between tests 3 and 4 are pretty small, but the execution time returns to a similar magnitude of that in test 1 for the vectorised approach, suggesting that the re-allocation to create HH costs most of the time.


Footnote: tic/toc timings in MATLAB can be improved by using the in-built timeit function, which essentially takes an average over several runs. One interesting thing to observe from the workings of timeit though is that it explicitly "warms up" (quoting a comment) the tic/toc function by calling it a couple of times. You can see when running tic/toc a few times from a clear workspace (with no intermediate code) that the first call takes longer than subsequent calls, as there must be some overhead for getting the timer initialised.

CodePudding user response:

I hope that the following modified benchmark could bring some new light to the problem:

s = 15000;

tic
% for-loop version
H = zeros(s,s);
for i =1:10
  for c = 1:s
    for r = 1:s
        H(r,c) = H(r,c)   1/(r c-1 i);
    end
  end
end
toc

tic;
% vectorized version
HH = zeros(s,s);
c = 1:s;
r = c';
for i=1:10
   HH= HH   1./(r c-1 i);
end
toc

isequal(H,HH)

In this case any kind of "cashing" is avoided by changing of matrix H (HH) at each for-loop (over "i") iteration.

I this case we get:

Elapsed time is 3.737275 seconds. (for-loop)
Elapsed time is 1.143387 seconds. (vectorized)

So, there is still performance improvement (~ 3x) due to the vectorization, which is probably done by implicit multi-threading implementation of vectorized Matlab commands.

Yes, tic/toc vs timeit is not strictly consistent, but the overall timing functionality is very similar.

  • Related