I have input and target data represented as MatrixXd (N x M) and VectorXd (N). The goal is to create mini-batches of size K consisting of a subset of input and target data shuffled in the same way. Then, the ML model will process these mini-batches in a loop. Could you recommend how to achieve this with as less as possible copying (maybe, with a code example)?
CodePudding user response:
Eigen comes with a Transpositions type that does just that. It works in-place by swapping rows or columns. So you can just keep shuffling the same matrix over and over again.
#include <Eigen/Dense>
#include <algorithm>
// using std::min
#include <cassert>
#include <random>
// using std::default_random_engine, std::uniform_int_distribution
void shuffle_apply(Eigen::Ref<Eigen::MatrixXd> mat,
Eigen::Ref<Eigen::VectorXd> vec,
int generations, int batchsize)
{
// colwise is faster than rowwise
const Eigen::Index size = mat.cols();
assert(vec.size() == size);
using Transpositions = Eigen::Transpositions<
Eigen::Dynamic, Eigen::Dynamic, Eigen::Index>;
Transpositions transp(size);
Eigen::Index* transp_indices = transp.indices().data();
std::default_random_engine rng; // seed appropriately!
for(int gen = 0; gen < generations; gen) {
for(Eigen::Index i = 0; i < size; i) {
std::uniform_int_distribution<Eigen::Index> distr(i, size - 1);
transp_indices[i] = distr(rng);
}
mat = mat * transp; // operates in-place
vec = transp * vec; // transp on left side to shuffle rows, not cols
for(Eigen::Index start = 0; start < size; start = batchsize) {
const Eigen::Index curbatch = std::min<Eigen::Index>(
batchsize, size - start);
const auto mat_batch = mat.middleCols(start, curbatch);
const auto vec_batch = vec.segment(start, curbatch);
}
}
}
See also Permute Columns of Matrix in Eigen and similar questions.
EDIT: An older version of this initialized the indices via std::shuffle which I think is wrong