Home > Net >  Choose n distinct elements from a vector with probability inverse-proportional to their index
Choose n distinct elements from a vector with probability inverse-proportional to their index

Time:10-22

Given a vector and a certain number of elements n, I'm looking for a way to choose n elements from a vector with probability inverse-proportional to their index.

Example:

std::vector v = {0, 1, 2, ... 998, 999};
n = 10;

A potential set of chosen indices may be:

{50, 200, 350, 500, 600, 700, 800, 850, 900, 950}

Notes:

  • The chosen indices need to be consistent between invocations.
  • The same index cannot be chosen twice in the same invocation result.
  • The density of indices from the beginning of the vector must be proportional to the density of indices from the end of the vector. I.e. the result {990 ... 999} is invalid for the given example.
  • I'd prefer to use as much code as I can from the standard library rather than having to implement by myself.
  • I may prefer a simple and working but less efficient solution over a complicated and efficient solution.

Thanks

CodePudding user response:

You can use a std::discrete_distribution:

#include <iostream>
#include <numeric>
#include <vector>
#include <random>
#include <map>

int main() {
    std::vector<int> v(10);
    std::iota(v.begin(),v.end(),1);

    std::vector<int> indices = v;
    //for (const auto& e : v) std::cout << e << " ";
    std::vector<double> probs(indices.size());
    std::transform(indices.begin(),indices.end(),probs.begin(),[](int x){ return 1.0/x;});
    //for (const auto& e : probs) std::cout << e << " ";

    // modified example from https://en.cppreference.com/w/cpp/numeric/random/discrete_distribution 
    std::random_device rd;
    std::mt19937 gen(rd());
    std::discrete_distribution<> d(probs.begin(),probs.end());
    std::map<int, int> m;
    for(int n=0; n<10000;   n) {
          m[d(gen)];
    }
    for(auto p : m) {
        std::cout << v[p.first] << " generated " << p.second << " times\n";
    }
}

For the sake of the example, v is just filled with 1 till v.size()-1 and because 1/0 is ill-formed, the indices start at 1 too (ie indices is simply a copy of v).

The above is choosing an element with weight 1/index. If you want to select each element only once you can set the respective weight to zero before choosing the next element.

CodePudding user response:

This paper describes a weighted random sampling method. Below is a C implementation.

This is assigning a weight of 1 / index for a 1-based indexing of your data

namespace views = std::ranges::views;

std::random_device rd;
std::mt19937 gen(rd()); // or whichever URBG you want
std::uniform_real_distribution<double> dist(0, 1);

std::map<double, std::size_t> weighted_indexes;
for (auto i : views::iota(0u, v.size())) {
    auto k = std::pow(dist(gen), 1.0 / (i   1));
    weighted_indexes[k] = i;
}

auto indexes = weighted_indexes | views::take(n) | views::values;
auto selected_values = indexes | views::transform([&v](std::size_t i){ return v[i]; });
  • Related