How to create a large number of combinations lazily in C-CodePudding

I want to create a combination of K elements one each from K sets. Each set can have n elements in it.

set1 = {a1, a2, a3}
set2 = {b1, b2, b3 , b4}
set3 = {c1, c2}
Required Combinations  = {{a1,b1,c1}, {a1,b2,c1} ... {a3,b4,c2}}

Number of combinations = 3*4*2 =24

So if K is large and n is large we run into Out of Memory very quickly. Refer to the below code snippet how we are creating combinations today. If we create all the combinations in a case where K is relatively large, we go out of memory! So for instance, if K=20 and each set has 5 elements, the combinations are 5^20, which is extremely large in memory. So I want an alternative algorithm where I don't need to store all those combinations in memory all at a time before I start consuming the combinations.

   vector<vector<string>> setsToCombine;
   vector<vector<string>> allCombinations;

   vector<vector<string>> *current =
      new vector<vector<string>>{vector<string>()};
   vector<vector<string>> *next = new vector<vector<string>>();
   vector<vector<string>> *temp;
   for (const auto& oneSet : setsToCombine) {
      for (auto& cur : *current) {
         for (auto& oneEle : oneSet) {
            cur.push_back(oneEle);
            next->push_back(cur);
            cur.pop_back();
         }
      }

      temp = current;
      current = next;
      next = temp;
      next->clear();
   }

   for (const auto& cur : *current) {
      allCombinations.push_back(cur);
   }

   current->clear();
   next->clear();

   delete current;
   delete next;

CodePudding user response：

You can store the indexes and lazely iterate over the combinations

#include <cstdint>
#include <iostream>
#include <vector>

using v_size_type = std::vector<int>::size_type;
using vv_size_type = std::vector<v_size_type>::size_type;

bool increment(std::vector<v_size_type> &counters, std::vector<v_size_type> &ranges) {
    for (auto idx = counters.size(); idx > 0; --idx) {
          counters[idx - 1];
        if (counters[idx - 1] == ranges[idx - 1]) counters[idx - 1] = 0;
        else return true;
    }
    return false;
}

std::vector<int> get(const std::vector<std::vector<int>> &sets, const std::vector<v_size_type> &counters) {
    std::vector<int> result(sets.size());
    for (vv_size_type idx = 0; idx < counters.size();   idx) {
        result[idx] = sets[idx][counters[idx]];
    }
    return result;
}

void print(const std::vector<int> &result) {
    for (const auto el : result) {
        std::cout << el << ' ';
    }
}

int main() {
    const std::vector<std::vector<int>> sets = {{-5, 2}, {-100, -21, 0, 15, 32}, {1, 2, 3}};
    std::vector<v_size_type> ranges(sets.size());
    for (vv_size_type idx = 0; idx < sets.size();   idx) {
        ranges[idx] = sets[idx].size();
    }
    std::vector<v_size_type> counters(sets.size());

    while (true) {
        print(get(sets, counters));
        std::cout << '\n';
        if (!increment(counters, ranges)) break;
    }
}

There are several disks, with values printed on it. And if the odometer runs forward, it will show the Cartesian product of all values on the disks.

That is somehow clear, but how to use this principle? The solution is, that each set of values will be a disk, and the values of the set, will be put on the corresponding disk. With that, we will have an odometer, where the number of values on each disk is different. But this does not matter.

Also here, if a disks overflows, the next disk is incremented. Same principle like a standard odometer. Just with maybe more or less values.

And, you can put everything on a disk, not just integers. This approach will work always.

We can abstract a disk as a std::vector of your desired type. And the odometer is a std::vector of disks.

All this we can design in a class. And if we add iterator functionality to the class, we can easily handle it.

In the example below, I show only a minimum set of functions. You can add as many useful functions to this class as you like and tailor it to your needs.

The object oriented approach is often better to understand in the end.

Please check:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <initializer_list>
#include <algorithm>
#include <iterator>

using MyType = int;
using Disk = std::vector<MyType>;
using Disks = std::vector<Disk>;

// Abstraction for a very simple odometer
class Odometer {
    Disks disks{};
public:
    // We will do nearly everything with the iterator of the odometer class
    struct iterator {
        // Definitions for iterator ----------------
        using iterator_category = std::forward_iterator_tag;
        using difference_type = std::ptrdiff_t;
        using value_type = std::vector<MyType>;
        using pointer = std::vector<MyType>*;
        using reference = std::vector<MyType>&;

        const Disks& d;                         // Reference to disks from super class
        int overflow{};                         // Indicates an overflow of all disks
        std::vector<std::size_t>positions{};    // Stores position of any disks

        // Iterator constructor
        iterator(const Disks& dd, const int over = 0) : d(dd), overflow(over) {
            positions = std::vector<std::size_t>(dd.size(), 0);
        }
        // Dereference iterator
        value_type operator*() const  {
            std::vector<MyType> result(d.size());
            for (std::size_t i{}; i < d.size();   i) result[i] = d[i][positions[i]];
            return result;
        };
        // Comparison
        bool operator != (const iterator& other) { return positions != other.positions or overflow != other.overflow; }
        // And increment the iterator
        iterator operator  () { 
            int carry = 0; std::size_t i{};
            for (i=0; i < d.size();   i) {
                if (positions[i] >= d[i].size() - 1) {
                    positions[i] = 0;
                    carry = 1;
                }
                else {
                      positions[i];
                    carry = 0;
                    break;
                }
            }
            overflow = (i == d.size() and carry) ? 1 : 0;
            return *this; 
        }
    };
    // Begin and End functions. End is true, if there is a flip over of all disks
    iterator begin() const { return iterator(disks); }
    iterator end() const { return iterator(disks, 1); }

    // Constructors
    Odometer() {};  // Default (useless for this example)
    // Construct from 2d initializer list
    Odometer(const std::initializer_list<const std::initializer_list<MyType>> iil) {
        for (const std::initializer_list<MyType>& il : iil) {
            disks.push_back(il);
        }
    }
    // Variadic. Parameter pack and fold expression
    template <typename ... Args>
    Odometer(Args&&... args) {
        (disks.push_back(std::forward<Args>(args)), ...);
    }
    // Simple output of everything
    friend std::ostream& operator << (std::ostream& os, const Odometer& o) {
        for (const auto vi : o) {
            for (const MyType i : vi) os << i << ' ';
            os << '\n';
        }
        return os;
    }
};

// Some test
int main() {

    // Define Odometer. Initialiaze wit normal initializer list
    Odometer odo1{ {1,2},{3},{4,5,6} };

    // Show complete output
    std::cout << odo1 << "\n\n\n";


    // Create additional 3 vectors for building a new cartesian product
    std::vector<MyType> v1{ 1,2 };
    std::vector<MyType> v2{ 3,4 };
    std::vector<MyType> v3{ 5,6 };

    // Define next Odometer and initialize with variadic constructor
    Odometer odo2(v1, v2, v3);

    // Use range based for loop for output
    for (const std::vector<MyType>& vm : odo2) {
        for (const MyType i : vm) std::cout << i << ' ';
        std::cout << '\n';
    }
}