Performance of updating/inserting into a sparse matrix in Matlab?-CodePudding

i have written a fairly large class for the calculation of measurement uncertainties, but it is painfully slow. Profiling the code shows that the slowest operation, by far, is to insert the computation results into a large sparse matrix. About 97% of all time is spent on that operation. The matrix keeps the uncertainties of all measurement data, and I cannot change the data structures without breaking a lot of other code. So my only option is to optimize the data insertion step. This is done about 5700 times in my benchmark, and every time the amout of data increases.

First solution, extremely slow:

% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), numel(obj.w));

% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), numel(obj.w));

obj.w = [obj.w; z(:)]; % insert new measurement results into the column vector obj.w
obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; % insert new uncertainties of the measurement results
obj.dw_dw = [obj.dw_dw, transpose(dz_dw); dz_dw, dz_dz]; % insert "dependencies" of new measurements on old results

The line obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; is the slowest, by far. Perhaps it is slow because Matlab needs to allocate a new, larger buffer for obj.Uww and copy everything over. Thus I changed the code to the following:

% Preallocation in the class constructor
obj.w = spalloc(nnz_w, 1, nnz_w);
obj.Uww = spalloc(nnz_w, nnz_w, nnz_Uww);
obj.dw_dw = spalloc(nnz_w, nnz_w, nnz_dw_dw);
obj.num_w = 0; % to manually keep track of the "real" size of obj.w, obj.Uww and obj.dw_dw

The class constructor is called with the sizes the three properties w, Uww and dw_dw will have at the end of the computation (nzz_w is approximately 0.1 million, nzz_Uww is at 9 million and nzz_dw_dw is about 1.6 million). Thus, no new allocation of memory should be needed. This is the inserting step now:

% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), obj.num_w);

% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), obj.num_w);

wInd = 1:obj.num_w;
obj.w(zInd, 1) = z(:); % insert new measurement results
obj.num_w = zInd(end); % new "real" size of w, Uww and dw_dw
obj.Uww(zInd, wInd) = Uzw; % about 51% of all computation time
obj.Uww(wInd, zInd) = transpose(Uzw); % about 15% of all computation time
obj.Uww(zInd, zInd) = Uzz; % about 14.4% of all computation time
obj.dw_dw(zInd, wInd) = dz_dw; % about 13% of all computation time
obj.dw_dw(wInd, zInd) = transpose(dz_dw); % about 3.5% of all computation time
obj.dw_dw(zInd, zInd) = dz_dz; % less than 3.5% of all computation time

Still, these lines account for 97% of all computation time, and no speed improvement. Thus I tried version three:

obj.w = [obj.w; z(:)];

[zInd_zy, yInd_zy] = ndgrid(zInd, yInd(:));
[zzInd_i, zzInd_j] = ndgrid(zInd, zInd);

[Uww_i, Uww_j, Uww_v] = find(obj.Uww); % 14% of all computation time
Uww_new = sparse( ... % this statement takes 66% of all computation time
    [Uww_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
    [Uww_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
    [Uww_v;     Uzy(:);     Uzy(:);     Uzz(:)], ...
    numel(obj.w), numel(obj.w));

[dw_dw_i, dw_dw_j, dw_dw_v] = find(obj.dw_dw);
dw_dw_new = sparse( ... % 14% of all computation time
    [dw_dw_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
    [dw_dw_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
    [dw_dw_v;   dz_dy(:);   dz_dy(:);   dz_dz(:)], ...
    numel(obj.w), numel(obj.w));
obj.Uww = Uww_new;
obj.dw_dw = dw_dw_new;

which is even slower thant the two other versions. Why is inserting into an already preallocated array so slow? And how can I speed it up?

(All the matrices are symmetric, but I did not try to exploit that yet.)

CodePudding user response：

I don't understand the details of your update pattern, but keep in mind that Matlab stores sparse matrices internally in compressed-sparse column format. So adding entries in sequence column-by-column is significantly faster than other orders. E.g., on my old version of Matlab (R2006a), this:

n=10000;
nz=400000;
v=floor(n*rand(nz,3)) 1;

fprintf('Random\n');
A=sparse(n, n);
tic
for k=1:nz
  A(v(k,1), v(k,2))=v(k,3);
end
toc

fprintf('Row-wise\n');
v=sortrows(v);
A=sparse(n, n);
tic
for k=1:nz
  A(v(k,1), v(k,2))=v(k,3);
end
toc

fprintf('Column-wise\n');
v=sortrows(v, [2 1]);
A=sparse(n, n);
tic
for k=1:nz
  A(v(k,1), v(k,2))=v(k,3);
end
toc

gives this:

>> sparsetest
Random
Elapsed time is 19.276089 seconds.
Row-wise
Elapsed time is 20.714324 seconds.
Column-wise
Elapsed time is 1.498150 seconds.

Likely best of all would be if you can somehow just collect the nonzeros in a form suitable for spconvert or sparse and then make the whole sparse matrix at the end, but I gather that you might not be able to do that.

CodePudding user response：

@bg2b Pointed out how adding data column-wise is much faster.

It turns out adding rows to a sparse matrix or sparse vector is painfully slow (more precisely, to the lower triangular part). Thus, I now store only the upper triangular part of the sparse matrix, because that is fast. When I need data from the matrix, I recreate it from the upper triangular sparse matrix. See the end of the benchmarking script for this.

This is my benchmarking script. It nicely shows the exponential increase in computation time for adding data.

% Benchmark the extension of sparse matrices of the form
% Uww_new = [ Uww, Uwz; ...
%             Uzw, Uzz];
% where Uzw = transpose(Uwz). Uww and Uzz are always square and symmetric.
close all;
clearvars;

rng(70101557, 'twister'); % the seed is the number of the stack overflow question
density = 0.25;
nZ = 10;
iterations = 5e2;
n = nZ * iterations;
nonzeros = n*n*density;

all_Uzz = cell(iterations, 1);
all_Uwz = cell(iterations, 1);
for k = 1:iterations
    % Uzz must be symmetric!
    Uzz_nonsymmetric = sprand(nZ, nZ, density);
    all_Uzz{k} = (Uzz_nonsymmetric   transpose(Uzz_nonsymmetric))./2;
    all_Uwz{k} = sprand((k-1)*nZ, nZ, density);
end
f = figure();
ax = axes(f);
hold(ax, 'on');
grid(ax, 'on');
xlabel(ax, 'Iteration');
ylabel(ax, 'Elapsed time in seconds.');
h = gobjects(1, 0);

name = 'Optimised.';
fprintf('%s\n', name);
Uww_optimised = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww_optimised, 1)   (1:size(Uzz, 1));
    wInd = 1:size(Uww_optimised, 1);
    Uzz_triu = triu(Uzz);
    Uww_optimised(wInd, zInd) = Uwz; % add columns
    Uww_optimised(zInd, zInd) = Uzz_triu; % add rows and columns
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];

name = 'Only Uwz and Uzz.';
fprintf('%s\n', name);
Uww_wz_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww_wz_zz, 1)   (1:size(Uzz, 1));
    wInd = 1:size(Uww_wz_zz, 1);
    Uzw = transpose(Uwz);
    Uww_wz_zz(wInd, zInd) = Uwz; % add columns
%   Uww_wz_zz(zInd, wInd) = Uzw; % add rows
    Uww_wz_zz(zInd, zInd) = Uzz; % add rows and columns
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];

name = 'Only Uzw and Uzz.';
fprintf('%s\n', name);
Uww_zw_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww_zw_zz, 1)   (1:size(Uzz, 1));
    wInd = 1:size(Uww_zw_zz, 1);
    Uzw = transpose(Uwz);
%   Uww_zw_zz(wInd, zInd) = Uwz;
    Uww_zw_zz(zInd, wInd) = Uzw;
    Uww_zw_zz(zInd, zInd) = Uzz;
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];

name = 'Uzw, Uwz and Uzz.';
fprintf('%s\n', name);
Uww = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww, 1)   (1:size(Uzz, 1));
    wInd = 1:size(Uww, 1);
    Uzw = transpose(Uwz);
    Uww(wInd, zInd) = Uwz;
    Uww(zInd, wInd) = Uzw;
    Uww(zInd, zInd) = Uzz;
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
leg = legend(ax, h, 'Location', 'northwest');

assert(issymmetric(Uww));
assert(istriu(Uww_optimised));
assert(isequal(Uww, Uww_optimised   transpose(triu(Uww_optimised, 1))));

%% Get Uyy from Uww_optimised. Uyy is a symmetric subset of Uww
yInd = randi(size(Uww_optimised, 1), 1, nZ); % indices to extract
[yIndRowInds, yIndColInds] = ndgrid(yInd, yInd);
indsToFlip = yIndRowInds > yIndColInds;
temp = yIndColInds(indsToFlip);
yIndColInds(indsToFlip) = yIndRowInds(indsToFlip);
yIndRowInds(indsToFlip) = temp;
linInd = sub2ind(size(Uww_optimised), yIndRowInds, yIndColInds);
assert(issymmetric(linInd));
Uyy = Uww_optimised(linInd);
assert(issymmetric(Uyy));