Home > other >  Eigen very slow when chaining operations
Eigen very slow when chaining operations

Time:01-27

While trying to compute the variance of row vectors in large matrices I've noticed an odd behavior with Eigen. If I chain all the required operations I get extremely slow performance, meanwhile computing a partial result then performing the exact same operations yields much faster results. This behavior seems to actually go against the Eigen docs/FAQ which says to avoid temporaries.

So my question is if there is some kind of known pitfall in the library I should perhaps avoid, and how to spot situations where this type of slow down might occur.

Here's the code I've used to test this. I've tried compiling it with MSVC (-O2 optimizations) and MinGW GCC (-O3) on windows. The "row variance with partial eval" version runs at around 560ms with GCC and 1s with MSVC, while the version without the partial takes around 90s with GCC and 104s with MSVC, a pretty absurd difference. I didn't try it but I imagine even a sequence of naive for loops would be a lot faster than 90 seconds...

#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <functional>

#include "Eigen/Dense"

void printTimespan(std::chrono::nanoseconds timeSpan)
{
    using namespace std::chrono;
    std::cout << "Timing ended:\n"
        << "\t ms: " << duration_cast<milliseconds>(timeSpan).count() << '\n'
        << "\t us: " << duration_cast<microseconds>(timeSpan).count() << '\n'
        << "\t ns: " << timeSpan.count() << '\n';
}

class Timer
{
    std::chrono::steady_clock::time_point start_;
public:
    void start()
    {
        start_ = std::chrono::steady_clock::now();
    }
    void stop()
    {
        timings.push_back((std::chrono::steady_clock::now() - start_).count());
    }
    std::vector<long long> timings;
};

std::vector<float> buildBuffer(size_t rows, size_t cols)
{
    std::vector<float> buffer;
    buffer.reserve(rows * cols);
    for (size_t i = 0; i < rows; i  )
    {
        for (size_t j = 0; j < cols; j  )
        {
            buffer.push_back(std::rand() % 1000);
        }
    }
    return buffer;
}

using EigenArr = Eigen::Array<float, -1, -1, Eigen::RowMajor>;
using EigenMap = Eigen::Map<EigenArr>;

std::vector<float> benchmark(std::function<EigenArr(const EigenMap&)> func)
{
    constexpr size_t rows = 2000, cols = 200, repetitions = 1000;
    std::vector<float> buffer = buildBuffer(rows, cols);
    EigenMap map(buffer.data(), rows, cols);
    EigenArr res;
    std::vector<float> means; //just to prevent the compiler from not computing anything because the results aren't used

    Timer timer;
    for (size_t i = 0; i < repetitions; i  )
    {
        timer.start();
        res = func(map);
        timer.stop();
        means.push_back(res.mean());
    }

    Eigen::Map<Eigen::Vector<long long, -1>> timingsMap(timer.timings.data(), timer.timings.size());
    printTimespan(std::chrono::nanoseconds(timingsMap.sum()));
    return means;
}


int main()
{
    std::cout << "mean center rows\n";
    benchmark([](const EigenMap& map)
    {
        return (map.colwise() - map.rowwise().mean()).eval(); 
    });

    std::cout << "squared deviations\n";
    benchmark([](const EigenMap& map)
    {
        return (map.colwise() - map.rowwise().mean()).square().eval();
    });

    std::cout << "row variance with partial eval\n";
    benchmark([](const EigenMap& map)
    {
        EigenArr partial = (map.colwise() - map.rowwise().mean()).square().eval();
        return (partial.rowwise().sum() / (map.cols() - 1)).eval();
    });

    std::cout << "row variance\n";
    benchmark([](const EigenMap& map)
    {
        return ((map.colwise() - map.rowwise().mean()).square().rowwise().sum() / (map.cols() - 1)).eval();
    });
}

CodePudding user response:

I suspect it's the double rowwise() on the slower one.

A lot of operations in Eigen are computed on demand, and don't create temporaries. This is done to prevent unnecessary copies of the data. But I suspect that every time the outer rowwise() is being asked for an element, it's computing the inner portion, squaring the number of operations. By saving a copy once, it prevents each cell being evaluated multiple times.

You could also do it on one line by calling .eval() after the square().

The other possibility is just a cache issue, if it's being forced to skip around in memory a lot.

  • Related