This is my code:
variables=1000;
t=20;
x=zeros(t,t,3);
y=rand(variables,3);
z=rand(t,t,variables);
e=rand(variables,1);
for c=1:variables
x(:,:,1)=x(:,:,1) y(c,1).*((z(:,:,c)-e(c)).^2);
x(:,:,2)=x(:,:,2) y(c,2).*((z(:,:,c)-e(c)).^2);
x(:,:,3)=x(:,:,3) y(c,3).*((z(:,:,c)-e(c)).^2);
end
How can I improve calculation speed on this loop? I think that the problem is the for
loop with a large c
.
CodePudding user response:
It's a myth, but alas a persistent one, that loops are slow in MATLAB. As you've written your for
loop, it goes sequentially through the last dimension of your variables. That pretty much translates to a FORTRAN loop directly, leaving little room for improvement using vectorisation. The below does vectorise your output as much as possible, but doesn't improve performance much, even thoughreshape()
is almost free, and severely degrades readability.
In each iteration, all you're doing is calculating y(c,1).*((z(:,:,c)-e(c)).^2)
, which is added to the total. If we are able to vectorise that expression, we can sum over the dimension of c
to get rid of the loop.
z(:,:,c)-e(c)
can be vectorised by adding two singleton dimensions to e
: reshape(e, [1 1 numel(e)])
, then subtract and power by 2 as usual.
Multiplication by y(c,1)
also works, if we add two singleton dimensions to y(:,1)
:, reshape(y(:,1), [1 1 numel(e)])
, then multiply again as usual.
Finally, we just need to sum over our 3rd dimension and we end up with our t
-by- t
result: sum(tmp2, 3)
.
All that's left are the hardcoded three dimensions in x
, which I've left be in a loop.
The working code on R2007b:
variables=10;
t=2;
x=zeros(t,t,3);
y=rand(variables,3);
z=rand(t,t,variables);
e=rand(variables,1);
for ii = 1:size(x, 3)
x(:, :, ii) = sum(bsxfun(@times, reshape(y(:,1), [1 1 numel(e)]), bsxfun(@minus, z, reshape(e, [1 1 numel(e)])).^2), 3);
end
I wasn't sure what to do with the hardcoded dimension of 3
, so I just left a loop over that. The rest is vectorised away, thanks to a few permute()
calls to arrange the dimensions for the bsxfun()
expansion.
Code for >R2016b with implicit expansion:
for ii = 1:size(x, 3)
x(:, :, ii) = sum(reshape(y(:,ii), [1 1 numel(e)]) .* (z - reshape(e, [1 1 numel(e)])).^2, 3)
end
A quick timing comparison shows that this is roughly 2x faster than your original loop:
Elapsed time is 0.780516 seconds. Original code
Elapsed time is 0.397369 seconds. My bsxfun() solution
Elapsed time is 0.305160 seconds. My implicit expansion
Note that in the above a 100 loops were ran for each code version, i.e. timings are 8ms, 4ms and 3ms per version.
For a introduction to reshape()
you can refer to this answer of mine.
The documentation article on implicit broadcasting is rather good, as is this blog.