Algorithm for finding a set of N items based on percentages of how many times an item may appear-CodePudding

Let's say I have M boxes of different colors and sizes. I need to find a set of N boxes that contain different combinations of colors and sizes.

For example, the user could say they want the final set of 20 boxes from a bucket of 1500 boxes to contain 20-80% of blue boxes, 5-30% of boxes with a width of 20 inches, and 100% of boxes with a height of 20". I can pick any 20 boxes from the initial bucket of 1500 boxes as long as the final set of 20 boxes matches those three constraints.

This feels like the knapsack problem, but I'm not 100% convinced it is.

CodePudding user response：

Once, walk the list of M candidate boxes.

If adding the box meets the total criteria (see below), add it. Adjust the criteria based on box selected.

Continue until N boxes found.

This does not certainly end to a solution, yet likely does if a solution exists.

It is possible that there is no solution.

Criteria:

20-80% Blue
20-80% Non-Blue
5-30% are 20" wide
70-95% are not 20" wide
100% are 20" tall.

CodePudding user response：

First, reduce bucket size by boxes which match NOT the height of 20".

From the remaining boxes, create 4 buckets:

1: blue, 20 inches width
2: not blue, 20 inches width
3: blue, NOT 20 inches width
4: not blue, NOT 20 inches width

The conditions to satisfy:

C1: 4 to 16 boxes from bucket 1 or 3
C2: 1 to 6 boxes from bucket 1 or 2
C3: 20 boxes altogether from bucket 1, 2, 3, 4

Then, not most efficient, but simple:

int b1, b2, b3, b4;
int B1 = size_of_bucket_1;
int B2 = size_of_bucket_2;
int B3 = size_of_bucket_3;
int B4 = size_of_bucket_4;

    for(b1=0; b1<=6; b1  )
    {
       for(b2=0; b2<=6; b2  )
       {
          for(b3=0; b3<=16; b3  )
          {
             if(b1<=B1 &&
                b2<=B2 &&
                b3<=B3 &&
                b4<=B4 &&
                b1 b2>=1 && b1 b2<=6  &&
                b1 b3>=4 && b1 b3<=16 && 
                b1 b2 b3<=20 && b4>=20-b1-b2-b3)
             {
                // Debug Solution Print
                // printf("%i %i %i %i\n", b1, b2, b3, 20-b1-b2-b3);
                return true; // Match found
             }      
          }
       }
    }
    return false; // No match possible

Altogether 7 * 7 * 17 = 833 cases to check. That is not too bad, for choosing any 20 boxes out of up to 1500. (It is possible to reduce the number of cases to check, with b1 fix in outer loop, the inner loops (intervals for b2 and b3) could be reduced.)

For a more general case (other conditions), but with 4 buckets and 20 boxes to choose altogether: A check of 21 * 21 * 21 * 21 = 194481 cases would usually still be good enough. Anyway, in fact with total number of boxes being fixed to 20, it is just 21 * 21 * 21 = 9261 cases

According to this, from a mathematical point of view, the problem can be reduced such that only a useful number of cases needs to be checked, if there is match. (While any 20 out of 1500 would result in a huge, huge number of cases impossible to ever check). Anyway, it might still be a challenge to create a program which covers "the general case" - using other conditions.