I have a large matrix [300000, 64] ( tests )
I need the autocorrelation of each row independent of all the others
result would be [300000 , 127 ]
I do it this way
for i = 1:rw
xcorrresults(i,:) = xcorr(tests(i,:));
end
but it takes most all the time of the program.
Is there a way to vectorize the loop?
CodePudding user response:
You can save a significant amount of runtime by pre-allocating the output (the larger your input, the more significant)
xcorrresults2 = zeros(rw,2*cl-1);
for i = 1:rw
xcorrresults2(i,:) = xcorr(tests(i,:));
end
Internally, xcorr
deals with everything as a column array, and does some additional manipulation if you give it a row. Giving it a column up front saves (surprisingly) a further 35% for me
xcorrresults3 = zeros(rw,2*cl-1);
for i = 1:rw
xcorrresults3(i,:) = xcorr(tests(i,:).').';
end
You could use profile
to drill down into the xcorr
function if desired. For example around 20% of the runtime for me is for an internal loop of that function to determine the transform length - if your data is a fixed size then this can be determined once and used as an input if you made a similar but custom function.
CodePudding user response:
I suspected the long run time was due to call overhead of doing xcorr for all my sample sequences in seperate calls.
I tried Cris suggestion above.
First i transposed my array so that signal realizations were in columns with lots of rows. Then padded with zeros so the FFT would be linear.
Then FFT this. In octave FFT of a matrix works on the columns.
Then IFFT the product of FFT result with its conjugate
tests = [ zeros(rw,64) tests]; % padding
T = fft(tests');
P = fftshift(abs(ifft(T.*conj(T))))';
using Xcorr on each sample with typically 30000 sequences to check was 19 sec.
using the IFFT(FFT(x) .* CONJ(FFT(X)) finished usually around 0.35 seconds.
Now that's more like it..
CodePudding user response:
This computation is already heavy, but MATLAB algorithm for xcorr
is fortunately efficient enough, it uses fft
internally. You can still gain near 4X speedup using parallelization on a 4-core machine. Just use parfor
instead of for
and use transposed arrays to work column-wise in optimum memory order.
clc, clear
tests = ones(300000, 64)';
[m, n] = size(tests);
xcorrresults = zeros(2*m-1, n);
parfor i = 1:n
xcorrresults(:,i) = xcorr(tests(:,i));
end