The largest number of subsets for a set-CodePudding

Given a set of X numbers less than or equal to Y, which may contain repeated numbers:

which algorithm gives you the maximum number of subsets whose sum of their elements is greater than or equal to Y, where none of the elements of one subset can be contained in another, and each subset cannot repeat the same element.

(note: if in the initial set two numbers are repeated, each counts as a distinct element)

subsets can group elements into duos, trios, quartets or any other number.

doing two for loops to search for the best combination for the highest number worked for doubles, but given that it is possible to do trios and so on, cases like "1 1 1 1 1 7 8" would be suboptimized

CodePudding user response：

This is strongly NP hard since it contains as a special case the special case of the 3 partition problem of dividing a set into triplets that each have the same sum where all numbers are in the range of that sum/4 to that sum/2. And that special case is known to be strongly NP hard.

Therefore there is no known algorithm to solve it, and finding one would be a really big deal.

CodePudding user response：

You could implement a 'brute force' method and go through every possible partitioning and check if it satisfies your requirements. This would be quite simple, but horribly inefficient except for trivial cases.

Suppose you have N elements e_i in your set S, with 0 <= e_i <= Y. Choose numparts as the number of partitions you are going to try to create with element sum >= Y. Assuming sum e_i >= Y, we can set numparts = 1 initially, otherwise, obviously, the answer is zero..

Then you can generate partitions by creating an array of N elements p_i where 0 <= p_i < numparts. There are numparts^N possible such partitions!! Now, you have to try to find one in which for all 0 <= j < numparts, sum {e_i : p_i = j} >= Y. If you find one, increment numparts, if you don't, then you have your answer which is the largest numparts value for which you did find a qualifying partition.

You could improve the efficiency of this approach significantly by avoiding lots of partitions that don't have a sum >= Y. There are 'only' 2^N distinct subsets of S so the number of subsets with sums >=Y must be less than or equal to 2^N. For each such subset S_k, you can try to find the maximum number of partitions of S - S_k each with sums >= Y which is just a recursion of the same problem. This would give you the absolute maximum result you're looking for, but would still be a computational nightmare for non-trivial problems.

A quick-but-suboptimal algorithm is simply to sort the array in ascending order, then partition according to the partition sum as you process the sorted elements sequentially. e.g.

Suppose s[i] are the elements in the sorted array,...

partitionno = 0;
partitionsum = 0;
for (i=0; i<N; i  ) {
  partitionsum  = s[i];
  if (partitionsum >= Y) {
    partitionsum = 0;
    partitionno  ;
  }
}

giving partitionno subsets each having a sum of at least Y. Sorting can be done in O(N) time, and the algorithm above is also O(N) so you could use this for N in the 1000000s or more.