Efficient manipulation of large, non-sparse arrays in matlab-CodePudding

I have a row vector q with 200 elements, and another row vector, dij, which is the output of the pdist function with currently 48216200 elements, but I'd like to be able to go higher. The operation I want to do is essentially:

t=sum(q'*dij,2);

However, since this tries to allocate a 200x48211290 array, it complains that this would require 70GB of memory. Therefore I do it this way:

t = zeros(numel(q),1);
for i=1:numel(q)
    qi = q(i);
    factor = qi*dij;
    t(i)=sum(factor);
end

However, this takes too much time. By too much time, I mean it takes about 36s, which is orders of magnitude longer than the time required by the pdist function. Is there a way I can speed up this operation without explicitly allocating so much memory? I'm assuming here, that if the first way could allocate the memory, (being a vector operation) it would be faster.

CodePudding user response：

Just use the distributive property of multiplication with respect to addition:

t = q'*sum(dij);

CodePudding user response：

for testing what Cris said in the first post comment I created 3 ".m" files as follows:

vec.m :

res=sum(sin(d.*q')./(d.*q'));

forloop.m

for i=1:200
    res(i)=sum(sin(d.*q(i))./(d.*q(i)));
end

and test.m:

clc
clear all
d=rand(4e6,1);
q=rand(200,1);
res=zeros(1,200);

forloop;
vec;
forloop;
vec;
forloop;
vec;

then I used matlab run and time profiler , the results were very surprising ! :

3 calls to forloop : ~10.5 S
3 call to vec : 15.5 S !!!

and additionally when I converted data to single the results were:

... forloop : 7.5 S
... vec : 8.5 S

I don't know precisely why for-loop is faster in these scenarios, but as for your problem, you could speed up things by generating lesser variables in the loop and using vertical vectors( i think). and finally converting your data to single values :

q=single(rand(200,1)); 
...