I am working on a chess engine for some time now. For improving the engine, I wrote some code which loads chess-positions from memory into some tuner code. I have around 1.85B fens on my machine which adds up to 40Gb (24B per position).
After loading, I end up with a vector of positions:
struct Position{
std::bitset<8*24> bits{};
}
void main(){
std::vector<Position> positions{};
// mimic some data loading
for(int i = 0; i < 1.85e9; i ){
positions.push_back(Position{})
}
// ...
}
The data is organised in the following way:
The positions are taken from games where the positions are seperated by just a few moves. Usually about 40-50 consecutive moves come the same game / line and are therefor somewhat equal.
Eventually I will read 16384 position within a single batch and ideally none of those positions come from the same game. Therefor I do some initial sorting before using the data.
My current shuffling method is this:
auto rng = std::default_random_engine {};
std::shuffle(std::begin(positions), std::end(positions), rng);
Unfortunately this takes quiet some time (about 1-2 minutes). Since I dont require perfect shuffles, I assume that some easier shuffles exist.
My second aproach was:
for(int i = 0; i < positions.size(); i ){
std::swap(positions[i], positions[(i*16384) % positions.size()]);
}
which will ensure that there are not going to be positions coming from the same game within a single batch and are evenly spaces by 16384 entries.
I was wondering if there is some even simpler, faster solution. Especially considering that the modulo-operator requires quiet some clock cycles.
I am happy for any "trivial" solution.
Greetings Finn
CodePudding user response:
There is a tradeoff to be made: Shuffling a a std::vector<size_t>
of indices can be expected to be cheaper than shuffling a std::vector<Position>
at the cost of an indirection when accessing the Position
s via shuffled indices. Actually the example on cppreference for std::iota
is doing something along that line (it uses iterators):
#include <algorithm> #include <iostream> #include <list> #include <numeric> #include <random> #include <vector> int main() { std::list<int> l(10); std::iota(l.begin(), l.end(), -4); std::vector<std::list<int>::iterator> v(l.size()); std::iota(v.begin(), v.end(), l.begin()); std::shuffle(v.begin(), v.end(), std::mt19937{std::random_device{}()}); std::cout << "Contents of the list: "; for(auto n: l) std::cout << n << ' '; std::cout << '\n'; std::cout << "Contents of the list, shuffled: "; for(auto i: v) std::cout << *i << ' '; std::cout << '\n'; }
Instead of shuffling the list directly, a vector of iterators (with a std::vector
indices woud work as well) is shuffled and std::shuffle
only needs to swap iterators (/indices) rather than the more costly actual elements (in the example the "costly to swap" elements are just int
s).
For a std::list
I don't expect a big difference between iterating in order or iterating via shuffled iterators. On the other hand, for a std::vector
I do expect a significant impact. Hence, I would shuffle indices, then rearrange the vector once, and profile to see which performs better.
CodePudding user response:
Randomness won't guarantee that samplings don't get positions from the same game which you wanted to avoid. I propose following pseudo-shuffle that does prevent samplings from the same game (given sufficiently large population):
- let N be the length of the longest game 1
- let E be iterator to the end
- let i be random index
- while E != begin
- if i > E - begin
- i %= E - begin
- --N
- Swap elements at i and std::prev(E)
- Decrement E
- i = N
- if i > E - begin