Home > Mobile >  The optimal way to split Eigen MatrixXd into fixed-size batches with randomly shuffled rows
The optimal way to split Eigen MatrixXd into fixed-size batches with randomly shuffled rows

Time:05-02

I have input and target data represented as MatrixXd (N x M) and VectorXd (N). The goal is to create mini-batches of size K consisting of a subset of input and target data shuffled in the same way. Then, the ML model will process these mini-batches in a loop. Could you recommend how to achieve this with as less as possible copying (maybe, with a code example)?

CodePudding user response:

Eigen comes with a Transpositions type that does just that. It works in-place by swapping rows or columns. So you can just keep shuffling the same matrix over and over again.

#include <Eigen/Dense>

#include <algorithm>
// using std::min
#include <cassert>
#include <random>
// using std::default_random_engine, std::uniform_int_distribution


void shuffle_apply(Eigen::Ref<Eigen::MatrixXd> mat,
                   Eigen::Ref<Eigen::VectorXd> vec,
                   int generations, int batchsize)
{
  // colwise is faster than rowwise
  const Eigen::Index size = mat.cols();
  assert(vec.size() == size);
  using Transpositions = Eigen::Transpositions<
    Eigen::Dynamic, Eigen::Dynamic, Eigen::Index>;
  Transpositions transp(size);
  Eigen::Index* transp_indices = transp.indices().data();
  std::default_random_engine rng; // seed appropriately!
  for(int gen = 0; gen < generations;   gen) {
    for(Eigen::Index i = 0; i < size;   i) {
      std::uniform_int_distribution<Eigen::Index> distr(i, size - 1);
      transp_indices[i] = distr(rng);
    }
    mat = mat * transp; // operates in-place
    vec = transp * vec; // transp on left side to shuffle rows, not cols
    for(Eigen::Index start = 0; start < size; start  = batchsize) {
      const Eigen::Index curbatch = std::min<Eigen::Index>(
            batchsize, size - start);
      const auto mat_batch = mat.middleCols(start, curbatch);
      const auto vec_batch = vec.segment(start, curbatch);
    }
  }
}

See also Permute Columns of Matrix in Eigen and similar questions.

EDIT: An older version of this initialized the indices via std::shuffle which I think is wrong

  • Related