Best team-making algorithm to use?-CodePudding

I need to determine which sorting algorithm would be the best for my situation. I am running a tournament for 16 people, and each tournament, every person earns a certain number of points.

These 16 people are grouped into four teams of four, and their averages should ideally be as close together as possible (to make the fairest teams).

I am not sure which sorting algorithm would be best for this. I have tried this with a greedy bin packing algorithm, which did not produce optimal teams. I have also read that this might be a knapsack or cutting stock problem. From what I have read, I believe the new algorithm would ideally use offline sorting. If anybody knows what algorithm would be best for my situation, please let me know. My code is below:

from typing import List
import binpacking
import numpy as np
import math

levels = np.genfromtxt('points.txt', dtype = 'float')
players = np.genfromtxt('players.txt', dtype = 'str')
playerLevels = zip(players, levels)
playerLevelsList = list(playerLevels)

averageLevels = sum(levels)/len(levels)
print('Average points are', averageLevels)

bins = binpacking.to_constant_bin_number(playerLevelsList, 4, 1)

a, b, c, d = [ [individualArray] for individualArray in bins]

a = a[0]
b = b[0]
c = c[0]
d = d[0]
a1 = a[1]
b1 = b[1]
c1 = c[1]
d1 = d[1]

team1 = list(zip(*a))
team2 = list(zip(*b))
team3 = list(zip(*c))
team4 = list(zip(*d))

points1 = round(sum(team1[1])/4, 2)
points2 = round(sum(team2[1])/4, 2)
points3 = round(sum(team3[1])/4, 2)
points4 = round(sum(team4[1])/4, 2)

stdev1 = math.trunc(np.std(team1[1]))
stdev2 = math.trunc(np.std(team2[1]))
stdev3 = math.trunc(np.std(team3[1]))
stdev4 = math.trunc(np.std(team4[1]))

print('Team 1 is', team1[0])
print('Team 2 is', team2[0])
print('Team 3 is', team3[0])
print('Team 4 is', team4[0])
print('\n-----Stats-----')
print('Team 1 Average is', points1, 'with a standard deviation of', stdev1)
print('Team 2 Average is', points2, 'with a standard deviation of', stdev2)
print('Team 3 Average is', points3, 'with a standard deviation of', stdev3)
print('Team 4 Average is', points4, 'with a standard deviation of', stdev4)```

CodePudding user response：

This is a variant of the balanced number partitioning problem, so is NP-hard to solve exactly. However, your constraints are small enough that you can test all possible divisions into teams: there are 16! / (4!)^5 such divisions, which is less than 3 million.

You also need to define your closeness metric. Some possibilities are 'minimize the largest average', 'maximize the smallest average', standard deviation, 'minimize the difference between maximum and minimum average', etc. When there are 3 or more teams, these metrics are not equivalent, so will give different results on certain inputs.

player_list = [3, 4, 5, 7, 12, 13, 13, 17, 18, 19, 19, 22, 22, 22, 24, 28]

def closeness_measure(number_list):
    """Given a list of numbers, output a non-negative 'closeness'
    according to some metric, with smaller numbers for 'closer' sets"""

    # Minimizing the maximum
    return max(number_list)

    # Standard dev
    # return statistics.stdev(number_list)


def find_best_packing(values, number_of_bins: int, size_per_bin: int, func_to_minimize):
    """Given a list of values to distribute into a fixed number
     of bins of uniform cardinality, return a packing that minimizes
     the given closeness measure.

     Works by testing all combinations,
     so run-time increases at least exponentially in len(values)
     """

    n = len(values)

    if number_of_bins < 1:
        raise ValueError('Must have positive bin count')

    if size_per_bin < 1:
        raise ValueError('Must have positive bin size')

    if number_of_bins * size_per_bin != n:
        raise ValueError('Must provide number_of_bins * size_per_bin values')

    def evaluate_func_on_packing(packing):
        return func_to_minimize([sum((values[x] for x in value_bin)) / size_per_bin
                                 for value_bin in packing])

    if number_of_bins == 1:
        return [list(values)]
    if size_per_bin == 1:
        return [[x] for x in values]

    # arbitrary starting packing
    best_packing = [list(range(i, i   size_per_bin))
                    for i in range(0, n - size_per_bin, size_per_bin)]

    best_packing_value = evaluate_func_on_packing(best_packing)

    def update_best_if_necessary(candidate_packing):
        nonlocal best_packing, best_packing_value
        candidate_value = evaluate_func_on_packing(candidate_packing)

        if candidate_value < best_packing_value:
            best_packing = candidate_packing[:]
            best_packing_value = candidate_value

    def complete_packing(available_people: Set[int], current_packing) -> None:

        if len(available_people) == 0:
            update_best_if_necessary(current_packing)
            return

        smallest = min(available_people)
        available_people.remove(smallest)

        new_candidates = sorted(available_people)

        for combination in itertools.combinations(new_candidates, size_per_bin - 1):
            complete_packing(available_people=available_people.difference(combination),
                             current_packing=current_packing
                                               [[smallest]   list(combination)])

        available_people.add(smallest)

    complete_packing(available_people=set(range(n)), current_packing=[])

    return (best_packing, best_packing_value)


best_packing, best_value = find_best_packing(values=player_list,
                                             number_of_bins=4,
                                             size_per_bin=4,
                                             func_to_minimize=closeness_measure)

print(f'{player_list=}')
print(f'{best_packing=}')
print([[player_list[y] for y in value_bin] for value_bin in best_packing])
print([sum(value_bin) / 4 for value_bin in best_packing])
print(f'{best_value=}')

For example, using the 'minimize the maximum sum' metric:

player_list = [3, 4, 5, 7, 12, 13, 13, 17, 18, 19, 19, 22, 22, 22, 24, 28]

best_packing = [[3, 7, 24, 28],
                [4, 17, 19, 22],
                [5, 13, 22, 22],
                [12, 13, 18, 19]]

averages = [8.0, 7.0, 8.0, 7.0]

Using this recursive method, you can certainly find the exact answer for your constraints on a single input set (runs in under 10s on my laptop), using negligible memory.

There are two main ways to improve this:

Use approximate algorithms. Some of these are mentioned on the Wiki article for balanced number partitioning, but there are too many possibilities to list here.
Given a known closeness measure, such as 'minimize the maximum average', you can prune the search tree considerably, since we only consider combinations whose average is less than our current best. Also, you can use integer programming solvers and numerical methods to solve the problem exactly in very little time for your input sizes.

CodePudding user response：

Brute-force solution using the partition algorithm from this related question: Producing all groups of fixed-length combinations:

from itertools import combinations  # to generate teamups
from heapq import nsmallest         # to select best teamups

# generating all teamups

def partitions(s, r):
    """
    Generate partitions of the iterable `s` into subsets of size `r`.
    >>> list(partitions(set(range(4)), 2))
    [((0, 1), (2, 3)), ((0, 2), (1, 3)), ((0, 3), (1, 2))]
    """
    s = set(s)
    assert(len(s) % r == 0)
    if len(s) == 0:
        yield ()
        return
    first = next(iter(s))
    rest = s.difference((first,))
    for c in combinations(rest, r - 1):
        first_subset = (first,)   c
        for p in partitions(rest.difference(c), r):
            yield (first_subset,)   p

# ranking measure for a given teamup

def closeness_measure(teamup):
    sums = [sum(team) for team in teamup]
    return max(sums) - min(sums)   # score of strongest team minus score of weakest team

Testing:

players = [5.27, 83.88, 60.5 , 97.37, 71.23, 12.74,  3.41, 40.21,
          13.14, 45.54, 24.1 , 98.24, 29.14, 12.49,  4.52, 20.06]

all_teamups = list(partitions(players, 4))

best_5_teamups = nsmallest(5, all_teamups, key=closeness_measure)

print(best_n_teamups)
# [((97.37, 5.27, 24.1, 29.14), (98.24, 4.52, 40.21, 12.49), (3.41, 71.23, 20.06, 60.5), (13.14, 83.88, 12.74, 45.54)),
#  ((97.37, 5.27, 12.49, 40.21), (98.24, 4.52, 24.1, 29.14), (3.41, 71.23, 20.06, 60.5), (13.14, 83.88, 12.74, 45.54)),
#  ((97.37, 5.27, 24.1, 29.14), (98.24, 4.52, 40.21, 12.74), (3.41, 71.23, 20.06, 60.5), (13.14, 83.88, 12.49, 45.54)),
#  ((97.37, 5.27, 12.74, 40.21), (98.24, 4.52, 24.1, 29.14), (3.41, 71.23, 20.06, 60.5), (13.14, 83.88, 12.49, 45.54)),
#  ((97.37, 4.52, 24.1, 29.14), (98.24, 5.27, 40.21, 12.49), (3.41, 71.23, 20.06, 60.5), (13.14, 83.88, 12.74, 45.54))]

Useful documentation: