Choose One item from each Set without Duplication -- find all Solutions-CodePudding

I've looked at multiple forums and can't seem to find exactly what I'm looking for. I want to generate all possible solutions where I choose one number from each set of numbers, where a solution contains all numbers present and does not A) duplicate numbers and B) pull multiple numbers from one set. For an easy example, consider I have these five sets:

A: {1, 2, 3, 4, 5} 
B: {1, 2, 4}
C: {2, 3, 5}
D: {1, 3, 4, 5}
E: {3, 4}

Possible solutions will look something like:

{A:1, B:2, C:3, E:4, D:5}
{D:1, C:2, E:3, B:4, A:5}
...
etc.

As you can see, all numbers 1-5 are used exactly once and all sets A-E are featured exactly once. I know there are less than 120 solutions here (5!), because I cannot have a solution with E2 or D2 or C4, etc.

For concise examples such as this I can do the work manually, but I'm interested in doing this for a group of 50 sets, each containing up to 50 numbers, all within 1-50. Some categories will look like A, where the full set of numbers in the range are used, and some will look like E, where only a few numbers are used (for example, Set X may be simply {11, 19}).

So, if each set had all 50 numbers, I would have 50! (I think?) possible solutions. In my case, though, I know I can rule out any solution where Set X contributes a number other than 11 or 19, which significantly cuts down my solution space, especially when most of these categories will only have a few possible contributions.

Is there a simple way to generate solutions, or at least calculate the number of solutions?

I'm thinking I could:

1) sort all the sets from smallest length to largest length
2) choose a number from the first set (which may only have 1 possible choice)
3) eliminate that number from the remaining sets
4) move to the next set: If it has `0` elements, there is no possible solution using the previous selection of numbers
5) Repeat steps `1-4` until all sets have contributed a number: that is a solution (either print in full or add 1)

In theory, I would have (n1)*(n2-1)*(n3-1)*...*(1) possible solutions (where n1, n2, etc. is the size of that set), but where I'm getting tripped up is when I factor in the times when sets do and don't take numbers that other sets have, because I can't have duplicates. For instance, given Set A has every number: {1, 2, ..., 50} and Set B only has numbers 1-25: {1, 2, ..., 25}, if I choose a number >25 from Set A, then I have 25 possible choices from Set B. If I choose a number <=25 from Set A, then I only have 24 possible choices from Set B. I don't know how to factor in this uncertainty in my calculations, especially when I extrapolate this trend to more than a few sets of numbers.

I have a feeling that the sheer number of possible solutions would be too much for my computer to handle, but I'd at least like to know how many there are. I have all 50 sets of numbers if that would be useful in calculations. Any help would be appreciated with this. Thanks!

CodePudding user response：

A programmatic version (only realistic for small datasets) could be to invert the mappings, generate all combinations and ensuring we have a set of size 5:

d = {'A': {1, 2, 3, 4, 5},
     'B': {1, 2, 4},
     'C': {2, 3, 5},
     'D': {1, 3, 4, 5},
     'E': {3, 4}}

from collections import defaultdict
from itertools import product

d2 = defaultdict(set)
for k, s in d.items():
    for v in s:
        d2[v].add(k)
# {1: {'A', 'B', 'D'},
#  2: {'A', 'B', 'C'},
#  3: {'A', 'C', 'D', 'E'},
#  4: {'A', 'B', 'D', 'E'},
#  5: {'A', 'C', 'D'}}

out = [x for x in product(*d2.values()) if len(set(x))==len(d2)]

output:

[('A', 'C', 'E', 'B', 'D'),
 ('A', 'B', 'E', 'D', 'C'),
 ('A', 'B', 'D', 'E', 'C'),
 ('A', 'B', 'C', 'E', 'D'),
 ('D', 'A', 'E', 'B', 'C'),
 ('D', 'C', 'E', 'B', 'A'),
 ('D', 'B', 'E', 'A', 'C'),
 ('D', 'B', 'A', 'E', 'C'),
 ('D', 'B', 'C', 'E', 'A'),
 ('B', 'A', 'E', 'D', 'C'),
 ('B', 'A', 'D', 'E', 'C'),
 ('B', 'A', 'C', 'E', 'D'),
 ('B', 'C', 'E', 'A', 'D'),
 ('B', 'C', 'E', 'D', 'A'),
 ('B', 'C', 'D', 'E', 'A'),
 ('B', 'C', 'A', 'E', 'D')]

CodePudding user response：

Firstly, it is worth to mention that the term combinations isn't applicable here.

Combination is a selection of items from a set that has distinct members

And permutations represent different ways of arranging elements of a combination.

The key point, is that combination is selection done on a single source of data. Meanwhile, we have multiple Sets.

And if you pay attention to the result, which sample data provided in the question is expected to give, you'll find out that all these "combinations" consist of exactly the same elements 1,2,3,4,5 (because you've used only these numbers in all the sets), but combinations can't be comprised of completely identical elements.

The sequences that need to be generated are rather similar to Cartesian product, with one caveat: we need to discard some of them because of uniqueness requirement. So this term might not be suitable either. Hence, I will call them samples.

We can address the problem in the following steps:

Create a Queue that will contain all the resulting samples. Each sample would be represented by a Set.
Add the first empty sample to the queue.
Iterate over the given Collection of Sets and for every set based on the existing sample in the queue generate new samples containing all the elements from a current set that were not present previously. If there are no such elements, sample should be discarded.

Implementation might look like that:

public static Queue<Set<Integer>> findSamples(List<Set<Integer>> sets) {
    
    Queue<Set<Integer>> samples = new ArrayDeque<>(); // queue of samples
    samples.add(Collections.emptySet());              // the fist empty sample
    
    for (Set<Integer> set : sets) {
        Queue<Set<Integer>> updated = new ArrayDeque<>(); // updated samples removed from the queue would be accumulated here
        
        while (!samples.isEmpty()) {
            Set<Integer> sample = samples.remove();
            updated.addAll(tryAdd(sample, set));
        }
        
        samples = updated;
    }
    
    return samples;
}

public static List<Set<Integer>> tryAdd(Set<Integer> sample, Set<Integer> set) {
    List<Set<Integer>> newSamples = new ArrayList<>();
    
    for (Integer next : set) {
        if (sample.contains(next)) continue;
        
        Set<Integer> copy = new LinkedHashSet<>(sample); // NB: HashSet would be sufficient here, LinkedHashSet is used ONLY for demonstration purposes
        copy.add(next);
        newSamples.add(copy);
    }
    return newSamples;
}

main()

public static void main(String[] args) {
    List<Set<Integer>> sets = List.of(
        Set.of(1, 2, 3, 4, 5),
        Set.of(1, 2, 4),
        Set.of(2, 3, 5),
        Set.of(1, 3, 4, 5),
        Set.of(3, 4)
    );
    
    findSamples(sets)
        .forEach(System.out::println); // printing the result
}

Output:

[1, 4, 2, 5, 3]
[1, 2, 5, 4, 3]
[1, 2, 5, 3, 4]
[1, 2, 3, 5, 4]
[5, 4, 2, 1, 3]
[5, 2, 3, 1, 4]
[5, 1, 2, 4, 3]
[5, 1, 2, 3, 4]
[4, 2, 5, 1, 3]
[4, 1, 2, 5, 3]
[3, 2, 5, 1, 4]
[3, 1, 2, 5, 4]
[2, 4, 5, 1, 3]
[2, 1, 5, 4, 3]
[2, 1, 5, 3, 4]
[2, 1, 3, 5, 4]