How do I find a certain subset of the binomial coefficient-CodePudding

Let's suppose we have an array with length n. We'll call this array

keys[n] = {...}

What I am looking for is a certain array of subsets given by "n choose 3". We'll call this

combination[?][3] = {...}

This array needs to meet the following criteria:

Each subset of length 2 of the 3 keys in each element in array combination ("3 choose 2") has to appear at least in one other element in combination
Every key has to appear in at least one element in combination (actually in two elements because of the previous criterium)
The length of combination has to be as small as possible (so out of all solutions that satisfy the above two criteria, we need to pick one with minimum length)
Optional: combination is random everytime but still at minimum length
Optional: No subset of length 2 of the 3 keys in each element in array combination ("3 choose 2") appears particularly more often than others.

Here's an example:

Let keys[5] = {1,2,3,4,5};
"4 choose 3" yields the following 10 subsets: {1,2,3}, {1,2,4}, {1,2,5}, {1,3,4}, {1,3,5}, {1,4,5}, {2,3,4}, {2,4,5}, {2,3,5}, {3,4,5}
So one solution would be: {1,2,3}, {1,2,4}, {1,3,4}, {2,3,5}, {2,4,5}, {3,4,5} (at least I didn't find a shorter one)

I've been trying to solve this problem all day. I only managed to come up with one really convoluted algorithm that doesn't even work.

Does anyone know how to solve this or even just what you might call this problem?

CodePudding user response：

Here is the original solution to a single cover (oops).

Some solutions are more likely than others, but it is still pretty random. It is fast if not a lot of backtracking happens. So 13, for example, runs fast. But 8 runs slowly because the shortest solution has 11 triples, and it has to fail at 10 over and over again before it accepts that.

The idea is an A* search for the shortest possible solution. With some technical details explained in the code.

Note that it found the following solution for 5: [[1, 2, 5], [1, 3, 4], [2, 3, 5], [2, 4, 5]] This is shorter than you had.

import math
import heapq
import random

def min_cover (n):
    # Small helper function.
    def next_pair(pair):
        i, j = pair
        if j < n:
            return [i, j 1]
        elif i 1 < n:
            return [i 1, i 2]
        else:
            raise StopIteration

    # Another small helper function.
    def pair_in(pair, groups):
        i, j = pair
        while groups is not None:
            if i in groups[0] and j in groups[0]:
                return True
            groups = groups[1]
        return False

    numbers = [_ for _ in range(1, n 1)]
    # Queue will be a priority queue of:
    # [
    #    min_pairs_at_finish,
    #    neg_pair,
    #    calculation_for_min_pairs_at_finish,
    #    random_to_avoid_more_comparisons,
    #    last_pair,
    #    [triple, [triple, [triple, ...[None], ...]]]
    # ]
    #
    # The difference between min_pairs and its calculation
    # is subtle.  In the end, the number of pairs is a
    # multiple of 3.  So if we've calculated a minimum
    # of 10 airs, we ACTUALLY can't finish with less than 12.
    #
    # The reason for the ordering is as follows.
    #
    #     min_pairs_at_finish: We want as few as possible.
    #     neg_pair: Prefer continuing solutions that are far along.
    #     calculation_for_min_pairs_at_finish: Closer to done is better
    #     random_to_avoid_more_comparisons: Comparing complex data
    #       is slow.  Don't.  Bonus, this randomizes the answer!
    #     last_pair: Where to continue from.
    #     groups: The solution so far. This data structure efficiently
    #       reuses memory between different partial solutions.
    #
    queue = [[0, [-1, -1], n*(n-1)/2, 0, [1, 1], None]]
    while 0 < len(queue):
        min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
        try:
            pair = next_pair(pair)
            while pair_in(pair, groups):
                pair = next_pair(pair)

            for k in numbers:
                if k != pair[0] and k != pair[1]:
                    extra = 0
                    if pair_in([pair[0], k], groups):
                        extra  = 1
                    if pair_in([pair[1], k], groups):
                        extra  = 1
                    next_item = [
                        3 * math.ceil((min_cost   extra)/3),
                        [-pair[0], -pair[1]],
                        min_cost   extra,
                        random.random(),
                        pair,
                        [sorted([pair[0], pair[1], k]), groups],
                    ]
                    heapq.heappush(queue, next_item)
        except StopIteration:
            answer = []
            while groups is not None:
                answer.append(groups[0])
                groups = groups[1]
            return list(reversed(answer))

print(min_cover(5))
queue = [[0, n*(n-1)/2, 0, [1, 1], None]]
while 0 < len(queue):
    min_cost, min_calc_cost, r, pair, groups = heapq.heappop(queue)
    try:
        pair = next_pair(pair)
        while pair_in(pair, groups):
            pair = next_pair(pair)

        for k in numbers:
            if k != pair[0] and k != pair[1]:
                extra = 0
                if pair_in([pair[0], k], groups):
                    extra  = 1
                if pair_in([pair[1], k], groups):
                    extra  = 1
                next_item = [
                    3 * math.ceil((min_cost   extra)/3),
                    min_cost   extra,
                    random.random(),
                    pair,
                    [sorted([pair[0], pair[1], k]), groups],
                ]
                heapq.heappush(queue, next_item)
    except StopIteration:
        answer = []
        while groups is not None:
            answer.append(groups[0])
            groups = groups[1]
        return list(reversed(answer))

 print(min_cover(5))

And here is a solution to the double cover problem that was actually wanted using the same technique.

import math
import heapq
import random

def min_double_cover (n):
    # Small helper function.
    def next_pair(pair):
        i, j = pair
        if j < n:
            return [i, j 1]
        elif i 1 < n:
            return [i 1, i 2]
        else:
            raise StopIteration

    # Another small helper function.
    def double_pair_in(pair, groups):
        i, j = pair
        answer = 0
        while groups is not None:
            if i in groups[0] and j in groups[0]:
                answer  = 1
                if 2 <= answer:
                    return True
            groups = groups[1]
        return False

    def triple_in(triple, groups):
        i, j, k = triple
        while groups is not None:
            if i in groups[0] and j in groups[0] and k in groups[0]:
                return True
            groups = groups[1]
        return False

    numbers = [_ for _ in range(1, n 1)]
    # Queue will be a priority queue of:
    # [
    #    min_pairs_at_finish,
    #    neg_pair,
    #    calculation_for_min_pairs_at_finish,
    #    random_to_avoid_more_comparisons,
    #    last_pair,
    #    [triple, [triple, [triple, ...[None], ...]]]
    # ]
    #
    # The difference between min_pairs and its calculation
    # is subtle.  In the end, the number of pairs is a
    # multiple of 3.  So if we've calculated a minimum
    # of 10 airs, we ACTUALLY can't finish with less than 12.
    #
    # The reason for the ordering is as follows.
    #
    #     min_pairs_at_finish: We want as few as possible.
    #     neg_pair: Prefer continuing solutions that are far along.
    #     calculation_for_min_pairs_at_finish: Closer to done is better
    #     random_to_avoid_more_comparisons: Comparing complex data
    #       structures is slow.  Don't.
    #     last_pair: Where to continue from.
    #     groups: The solution so far. This data structure efficiently
    #       reuses memory between different partial solutions.
    #
    queue = [[0, [-1, -2], n*(n-1), 0, [1, 2], None]]
    while 0 < len(queue):
        min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
        try:
            while double_pair_in(pair, groups):
                pair = next_pair(pair)

            for k in numbers:
                if k != pair[0] and k != pair[1] and not triple_in([pair[0], pair[1], k], groups):
                    extra = 0
                    if double_pair_in([pair[0], k], groups):
                        extra  = 1
                    if double_pair_in([pair[1], k], groups):
                        extra  = 1
                    next_item = [
                        3 * math.ceil((min_cost   extra)/3),
                        [-pair[0], -pair[1]],
                        min_cost   extra,
                        random.random(),
                        pair,
                        [sorted([pair[0], pair[1], k]), groups],
                    ]
                    heapq.heappush(queue, next_item)
        except StopIteration:
            answer = []
            while groups is not None:
                answer.append(groups[0])
                groups = groups[1]
            return list(reversed(answer))

print(min_double_cover(5))

This time I couldn't run it for 8 at all. No clue why not. But lots of other numbers are fast.

At the expense of a potentially incorrect answer, here is a change to make it finish:

...
   queue = [[0, [-1, -2], n*(n-1), 0, [1, 2], None]]
    min_pairs = 3*math.ceil(n*(n-1)/3)
    threshold = 1000000
    next_threshold = threshold
    while 0 < len(queue):
        if next_threshold < len(queue):
            print("Threshold reached", next_threshold)
            min_pairs  = 3
            for x in queue:
                x[0] = max(min_pairs, x[0])
            heapq.heapify(queue)
            next_threshold  = threshold
        min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
...
                    next_item = [
                        max(min_pairs, 3 * math.ceil((min_cost   extra)/3)),
                        [-pair[0], -pair[1]],
                        min_cost   extra,
                        random.random(),
                        pair,
                        [sorted([pair[0], pair[1], k]), groups],
                    ]
...

And I found a bug. The following should now be correct.

The numbers where it couldn't figure it out fast, like 8, seem to be the ones where it has to backtrack to a larger number of triples. If you get the threshold message once, then the answer is possibly off by 1 but probably is right. If you get it twice, ditto.

import math
import heapq
import random

def min_double_cover (n):
    # Small helper function.
    def next_pair(pair):
        i, j = pair
        if j < n:
            return [i, j 1]
        elif i 1 < n:
            return [i 1, i 2]
        else:
            raise StopIteration

    # Another small helper function.
    def double_pair_in(pair, groups):
        i, j = pair
        answer = 0
        while groups is not None:
            if i in groups[0] and j in groups[0]:
                answer  = 1
                if 2 <= answer:
                    return True
            groups = groups[1]
        return False

    def triple_in(triple, groups):
        i, j, k = triple
        while groups is not None:
            if i in groups[0] and j in groups[0] and k in groups[0]:
                return True
            groups = groups[1]
        return False

    numbers = [_ for _ in range(1, n 1)]
    # Queue will be a priority queue of:
    # [
    #    min_pairs_at_finish,
    #    neg_pair,
    #    calculation_for_min_pairs_at_finish,
    #    random_to_avoid_more_comparisons,
    #    last_pair,
    #    [triple, [triple, [triple, ...[None], ...]]]
    # ]
    #
    # The difference between min_pairs and its calculation
    # is subtle.  In the end, the number of pairs is a
    # multiple of 3.  So if we've calculated a minimum
    # of 10 pairs, we ACTUALLY can't finish with less than 12.
    #
    # The reason for the ordering is as follows.
    #
    #     min_pairs_at_finish: We want as few as possible.
    #     neg_pair: Prefer continuing solutions that are far along.
    #     calculation_for_min_pairs_at_finish: Closer to done is better
    #     random_to_avoid_more_comparisons: Comparing complex data
    #       structures is slow.  Don't.
    #     last_pair: Where to continue from.
    #     groups: The solution so far. This data structure efficiently
    #       reuses memory between different partial solutions.
    #
    queue = [[0, [-1, -2], n*(n-1), 0, [1, 2], None]]
    min_pairs = 3*math.ceil(n*(n-1)/3)
    threshold = 100000
    next_threshold = threshold
    while 0 < len(queue):
        if next_threshold < len(queue):
            print("Threshold reached", next_threshold)
            min_pairs  = 3
            for x in queue:
                x[0] = max(min_pairs, x[0])
            heapq.heapify(queue)
            next_threshold  = threshold
        min_cost, neg_pair, min_calc_cost, r, pair, groups = heapq.heappop(queue)
        try:
            while double_pair_in(pair, groups):
                pair = next_pair(pair)

            for k in numbers:
                if k != pair[0] and k != pair[1] and not triple_in([pair[0], pair[1], k], groups):
                    extra = 0
                    if double_pair_in([pair[0], k], groups):
                        extra  = 1
                    if double_pair_in([pair[1], k], groups):
                        extra  = 1
                    next_item = [
                        max(min_pairs, 3 * math.ceil((min_calc_cost   extra)/3)),
                        [-pair[0], -pair[1]],
                        min_calc_cost   extra,
                        random.random(),
                        pair,
                        [sorted([pair[0], pair[1], k]), groups],
                    ]
                    heapq.heappush(queue, next_item)

        except StopIteration:
            answer = []
            while groups is not None:
                answer.append(groups[0])
                groups = groups[1]
            return list(reversed(answer))

print(min_double_cover(8))