I have written a function for matrix multiplication where matrices are defined using vectors of vectors containing double values.
vector<vector<double> > mat_mul(vector<vector<double> >A, vector<vector<double> >B){
vector<vector<double> >result(A.size(),vector<double>(B[0].size(),0));
if(A[0].size()==B.size()){
const int N=A[0].size();
for(int i=0;i<A.size();i ){
for(int j=0;j<B[0].size();j ){
for(int k=0;k<A[0].size();k )
result[i][j] =A[i][k]*B[k][j];
}
}
}
return result;
}
The code seems to work fine but is very slow for matrices large as 400X400. How can we speed this up? I have seen other answers but they discuss matrix multiplication but not about any speed up for vector of vectors.
Any help is highly appreciated.
CodePudding user response:
You are using the naive algorithm for matrix multiplication. It’s extremely cache unfriendly, and you are hit by the full latency.
First, split up the operations so your inner loop repeatedly accessed data that fit into your cache.
Second, calculate four sums simultaneously to avoid penalties for latency.
Third, use fma instructions (fused multiply-add) which calculate a product and a sum in the same time as a product.
Fourth, use vector registers.
Five, use multiple threads.
Or just use a package like linpack optimising things for you.
CodePudding user response:
struct matrix:
vector<double>
{
using base = vector<double>
using size_type=std::pair<std::size_t,std::size_t>;
void resize(size_type const& dims, double const v=0){
rows = dims.second;
base::resize(dims.first * rows);
};
size_type size() const { return { cols(), rows }; };
double& operator[](size_type const& idx) {
base& vec=*this;
return vec[idx.first idx.second * cols()];
};
double operator[](size_type const& idx) const {
base const& vec=*this;
return vec[idx.first idx.second * cols()];
};
private:
std::size_t cols() const { return base::size() / rows; };
std::size_t rows = 0;
};
///...
auto const size = std::tuple_cat(A.size(), std::make_tuple(B.size().second));
matrix result;
result.resize({get<0>(size), get<2>(size)});
for(auto i = 0; i < get<0>(size); i)
for(auto j = 0; j < get<1>(size); j)
for(auto k = 0; k < get<2>(size); k)
result[{i,k}] = A[{i,j}] * B[{j,k}];
I just skipped lots of details, such as none-dedault constructors which is needed if you want a pretty initialization syntax. Moreover as a matrix, this type will need lots of arithmetics operators.
Another approach would be type_erased 2D raw array, but that would require defining assignment operator, as well as copy and move constructors. So this std::vector
based solution seems to be the simplest implementation.
Also, if the dimensions are fixed at compile-time, a template alias can do:
template<typename T, std::size_t cols, std::size_t rows>
using array_2d = std::array<std::array<double, rows>, cols>;
array_2d<double, icount, kcount> result{};