Choose n distinct elements from a vector with probability inverse-proportional to their index-CodePudding

Given a vector and a certain number of elements n, I'm looking for a way to choose n elements from a vector with probability inverse-proportional to their index.

Example:

std::vector v = {0, 1, 2, ... 998, 999};
n = 10;

A potential set of chosen indices may be:

{50, 200, 350, 500, 600, 700, 800, 850, 900, 950}

Notes:

The chosen indices need to be consistent between invocations.
The same index cannot be chosen twice in the same invocation result.
The density of indices from the beginning of the vector must be proportional to the density of indices from the end of the vector. I.e. the result {990 ... 999} is invalid for the given example.
I'd prefer to use as much code as I can from the standard library rather than having to implement by myself.
I may prefer a simple and working but less efficient solution over a complicated and efficient solution.

Thanks

CodePudding user response：

You can use a std::discrete_distribution:

#include <iostream>
#include <numeric>
#include <vector>
#include <random>
#include <map>

int main() {
    std::vector<int> v(10);
    std::iota(v.begin(),v.end(),1);

    std::vector<int> indices = v;
    //for (const auto& e : v) std::cout << e << " ";
    std::vector<double> probs(indices.size());
    std::transform(indices.begin(),indices.end(),probs.begin(),[](int x){ return 1.0/x;});
    //for (const auto& e : probs) std::cout << e << " ";

    // modified example from https://en.cppreference.com/w/cpp/numeric/random/discrete_distribution 
    std::random_device rd;
    std::mt19937 gen(rd());
    std::discrete_distribution<> d(probs.begin(),probs.end());
    std::map<int, int> m;
    for(int n=0; n<10000;   n) {
          m[d(gen)];
    }
    for(auto p : m) {
        std::cout << v[p.first] << " generated " << p.second << " times\n";
    }
}

For the sake of the example, v is just filled with 1 till v.size()-1 and because 1/0 is ill-formed, the indices start at 1 too (ie indices is simply a copy of v).

The above is choosing an element with weight 1/index. If you want to select each element only once you can set the respective weight to zero before choosing the next element.

CodePudding user response：

This paper describes a weighted random sampling method. Below is a C implementation.

This is assigning a weight of 1 / index for a 1-based indexing of your data

namespace views = std::ranges::views;

std::random_device rd;
std::mt19937 gen(rd()); // or whichever URBG you want
std::uniform_real_distribution<double> dist(0, 1);

std::map<double, std::size_t> weighted_indexes;
for (auto i : views::iota(0u, v.size())) {
    auto k = std::pow(dist(gen), 1.0 / (i   1));
    weighted_indexes[k] = i;
}

auto indexes = weighted_indexes | views::take(n) | views::values;
auto selected_values = indexes | views::transform([&v](std::size_t i){ return v[i]; });