Home > other >  Algorithm for assigning items to groups based on binary criteria
Algorithm for assigning items to groups based on binary criteria

Time:10-01

I'm working on an algorithmic problem that has me stumped. It is related to both the assignment problem and maximum cardinality matching. I'll be using it to match large amounts of data so it's important to be fast, ideally O(n2) or better.

Problem Statement

Consider a set of universities U and a set of prospective students S. Assume that |U| << |S|. Each university has a specific number of slots available to admit students, and a binary criteria for whether each student in S is eligible or ineligible to attend. The number of prospective students is equal to the total number of slots available across all universities.

Is there a fast algorithm that will assign students to universities such that:

  • Each university admits exactly the number of students they wanted
  • Only students eligible for a university will be assigned to that university
  • If those conditions are not possible, it will return some value indicating that it was not possible (either a best possible incomplete match for the conditions or just -1/error/etc)

What I've considered

This can be modeled as a maximum-cardinality bipartite matching problem.

Construct the bipartite as follows:

  • Each prospective student is a node
  • Each university is a node
  • For each university, draw an edge to each eligible student
  • Duplicate each university node to represent the number of available slots for that university

The Hopcraft-Karp algorithm can find a maximum-cardinality matching in roughly O(n2.5) (where n is the number of students). If the cardinality of the match is equal to the number of students, the problem has been solved. If not, a solution is impossible for the given dataset.

However this approach seems to be inefficient because there are far more students than universities. Each university node has to be duplicated many times, resulting in a huge number of edges for the Hopcraft-Karp algorithm to process.

Is there a better way of abstracting this problem that I'm not seeing? Or is there a clever algorithm that can make use of the fact that there are far fewer universities than students and improve the runtime?

CodePudding user response:

There’s a simple O(|S| |U|²)-time algorithm, which meets your criterion in the worst case if |U| = O(√|S|) and likely does a lot better than the quoted running time in practice. (See also “bipush” variants of standard flow algorithms.)

The basis of this algorithm is Ford–Fulkerson. For each student in turn, we try to assign them a university. If we’re lucky, one of their choices has capacity, but in general, we need to shuffle other students around to make capacity (find an augmenting path).

Now, what typically makes finding an augmenting path expensive is that we have to traverse the whole graph. However! If we keep a |U|×|U| array where the (i, j) entry is a list of students that can move from university i to university j, then this takes O(|U|²) time (breadth-first search from the set of admissible universities of the new student; we only need to know if each cell of the array is empty). We (re)assign at most |U| students, each of which generates O(|U|) work to update the lists, so the total cost per iteration is O(|U|²).

One nice property of this algorithm in practice (many flow algorithms, actually) is that, if you feed it a partial assignment with k unmatched students, then the running time is O(|S| |U| k |U|²). The idea would be to find a fast heuristic that does a reasonable but incomplete job (e.g., shuffle the students and try to assign each one to the admissible university that is least popular with other students).

CodePudding user response:

I have implemented a two stage algorithm that is simple and fast.

There are two stages:

1 A heuristic that simply assigns each student to the least popular university that has room and for which they are eligible. This is based on a suggestion by David Eisenstat

Often this is sufficient to fill all the slots. When is is not, the second stage runs

2 MakeRoom searches for an assigned student that can be moved to another of their eligible universities to make room for a unassigned student.

Here is a typical output for a feasible randomly generated problem

random problem generated with 2488 university slots and 2500 students
2270 students assigned by hueristic
218 students assigned by MakeRoom
SUCCESS - all slots filled

The application for this is at https://github.com/JamesBremner/students2university

How large are the problems you are dealing with? If you let me know I can run a timing test on such a problem. The ~ 2500 slot problem runs in less than a second.

Note that the MakeRoom is not mathematically guaranteed to produce a perfect solution where every slot is filled. Problems can be imagined that would require several students being moved to make room for one more. However, I have run this many times on small and large problems and, so long as the problem is feasible, a perfect solution is always found. I believe this algorithm is suitable for practical use since it is easy to code, runs fast and if not perfect it will fill all the slots almost always and if not will only leave a few unfilled

Some more timing measures

slots secs
2500 0.005
25,000 0.25
250,000 26
  • Related