Algorithm for random sampling under multiple no-repeat conditions-CodePudding

I ran into the following issue:

So, I got an array of 100-1000 objects (size varies), e.g.something like

[{one:1,two:'A',three: 'a'}, {one:1,two:'A',three: 'b'}, {one:1,two:'A',three: 'c'}, {one:1,two:'A',three: 'd'},
{one:1,two:'B',three: 'a'},{one:2,two:'B',three: 'b'},{one:1,two:'B',three: ':c'}, {one:1,two:'B',three: 'd'},
{one:1,two:'C',three: 'a'},{one:1,two:'C',three: 'b'},{one:1,two:'C',three: ':c'}, {one:2,two:'C',three: 'd'},
{one:1,two:'C',three: 'a'},{one:1,two:'C',three: 'b'},{one:2,two:'C',three: ':c'}, {one:1,two:'C',three: 'd'},...]

The value for 'one' is pretty much arbitrary. 'two' and 'three' have to be balanced in a certain way: Basically, in the above, there is some n, such that n=4 times 'A'. 'B','C','D','a','b','c' and 'd' - and such an n exists in any variant of this problem. It is just not clear what the n is, and the combinations themselves can also vary (e.g. if we only had As and Bs, [{1,A,a},{1,A,a},{1,B,b},{1,B,b}] as well as [{1,A,a},{1,A,b},{1,B,a},{1,B,b}] would both be possible arrays with n=2).

What I am trying to do now, is randomise the original array with the condition that there cannot be repeats in close order for some keys, i.e. the value of 'two' and 'three' for an object at index i-1 cannot be the same as the value of same attribute for the object at index i (and that should be true for all or as many objects as possible), i.e. [{1,B,a},{1,A,a},{1,C,b}] would not be allowed, [{1,B,a},{1,C,b},{1,A,a}] would be allowed.

I tried some brute-force method (randomise all, then push wrong indexes to the back) that works rarely, but it mostly just loops infinitely over the whole array, because it never ends up without repeats. Not sure, if this is because it is generally mathematically impossible for some original arrays, or if it is just because my solution sucks.

By now, I've been looking for over a week, and I am not even sure how to approach this. Would be great, if someone knew a solution for this problem, or at least a reason why it isn't possible. Any help is greatly appreciated!

CodePudding user response：

First, let us dissect the problem. Forget for now about one, separate two and three into two independent sequences (assuming they are indeed independent, and not tied to each other). The underlying problem is then as follows.

Given is a collection of c1 As, c2 Bs, c3 Cs, and so on. Place them randomly in such a way that no two consecutive letters are the same.

The trivial approach is as follows. Suppose we already placed some letters, and are left with d1 As, d2 Bs, d3 Cs, and so on. What is the condition when it is impossible to place the remaining letters? It is when the count for one of the letters, say dk, is greater than one plus the sum of all other counts, 1 d1 d2 ... excluding dk. Otherwise, we can place them as K . K . K . K ..., where K is the k-th letter, and dots correspond to any letter except the k-th. We can proceed at least as long as dk is still the greatest of the remaining quantities of letters.

So, on each step, if there is a dk equal to 1 d1 d2 ... excluding dk, we should place the k-th letter right now. Otherwise, we can place any other letter and still be able to place all others. If there is no immediate danger of not being able to continue, adjust the probabilities to your liking, for example, weigh placing k-th letter as dk (instead of uniform probabilities for all remaining letters).

CodePudding user response：

This problem smells of NP complete and lots of hard combinatorial optimization problems.

Just to find a solution, I'd always place as the next element the remaining element that can be placed which as few possible remaining elements can be placed next to. In other words try to get the hardest elements out of the way first - if they run into a problem, then you're stuck. If that works, then you're golden. (There are data structures like a heap which can be used to find those fairly efficiently.)

Now armed with a "good enough" solver, I'd suggest picking the first element randomly until the solver can solve the rest. Repeat. If at any time you find it takes too many guesses, just go with what the solver did last time. That way all the way you know that there IS a solution, even though you are trying to do things randomly at every step.