Given a vector and a certain number of elements n, I'm looking for a way to choose n elements from a vector with probability inverse-proportional to their index.
Example:
std::vector v = {0, 1, 2, ... 998, 999};
n = 10;
A potential set of chosen indices may be:
{50, 200, 350, 500, 600, 700, 800, 850, 900, 950}
Notes:
- The chosen indices need to be consistent between invocations.
- The same index cannot be chosen twice in the same invocation result.
- The density of indices from the beginning of the vector must be proportional to the density of indices from the end of the vector. I.e. the result {990 ... 999} is invalid for the given example.
- I'd prefer to use as much code as I can from the standard library rather than having to implement by myself.
- I may prefer a simple and working but less efficient solution over a complicated and efficient solution.
Thanks
CodePudding user response:
You can use a std::discrete_distribution
:
#include <iostream>
#include <numeric>
#include <vector>
#include <random>
#include <map>
int main() {
std::vector<int> v(10);
std::iota(v.begin(),v.end(),1);
std::vector<int> indices = v;
//for (const auto& e : v) std::cout << e << " ";
std::vector<double> probs(indices.size());
std::transform(indices.begin(),indices.end(),probs.begin(),[](int x){ return 1.0/x;});
//for (const auto& e : probs) std::cout << e << " ";
// modified example from https://en.cppreference.com/w/cpp/numeric/random/discrete_distribution
std::random_device rd;
std::mt19937 gen(rd());
std::discrete_distribution<> d(probs.begin(),probs.end());
std::map<int, int> m;
for(int n=0; n<10000; n) {
m[d(gen)];
}
for(auto p : m) {
std::cout << v[p.first] << " generated " << p.second << " times\n";
}
}
For the sake of the example, v
is just filled with 1
till v.size()-1
and because 1/0
is ill-formed, the indices start at 1
too (ie indices
is simply a copy of v
).
The above is choosing an element with weight 1/index. If you want to select each element only once you can set the respective weight to zero before choosing the next element.
CodePudding user response:
This paper describes a weighted random sampling method. Below is a C implementation.
This is assigning a weight of 1 / index
for a 1-based indexing of your data
namespace views = std::ranges::views;
std::random_device rd;
std::mt19937 gen(rd()); // or whichever URBG you want
std::uniform_real_distribution<double> dist(0, 1);
std::map<double, std::size_t> weighted_indexes;
for (auto i : views::iota(0u, v.size())) {
auto k = std::pow(dist(gen), 1.0 / (i 1));
weighted_indexes[k] = i;
}
auto indexes = weighted_indexes | views::take(n) | views::values;
auto selected_values = indexes | views::transform([&v](std::size_t i){ return v[i]; });