Parallelization of bin packing problem by OpenMp-CodePudding

I am learning open mp, and I want to parallelize well-known BinPacking problem. But the problem is what whatever I try, can't get correct solution ( the one I get with sequential verstion).

So far, I have tried multiple different versions (including reduction, tasks, schedule) but didn't get anything useful.

Below is my the most recent try.


int binPackingParallel(std::vector<int> weight, int n, int c)
{
    int resltut = 0;
    int bin_rem[n];
    #pragma omp parallel for schedule(dynamic) reduction( :result)
    for (int i = 0; i < n; i  ) {
        bool done = false;
        int j;
        for (j = 0; j < result && !done; j  ) {
            int b ;
            #pragma omp atomic
            b = bin_rem[j] - weight[i];
            if ( b >= 0) {
                bin_rem[j] = bin_rem[j] - weight[i];
                done = true;
            }
        }
        if (!done) {
            #pragma omp critical
            bin_rem[result] = c - weight[i];
            result  ;
        }
    }
    return result;

}

Edit: I made modification on starting problem, so now there is given number of bins N and we need to check if all elements can be put in N bins. I made this by using recursion, still my parallel version is slower.

bool can_fit_parallel(std::vector<int> arr, std::vector<int> bins, int n) {
// base case: if the array is empty, we can fit the elements
if (arr.empty()) {
    return true;
}

bool found = false;
#pragma omp parallel for schedule (dynamic,10)
for (int i = 0; i < n; i  ) {
    if (bins[i] >= arr[0]) {
        bins[i] -= arr[0];
        if (can_fit_parallel(std::vector<int>(arr.begin()   1, arr.end()), bins, n)) {
            found = true;
            #pragma omp cancel for

        }
        // if the element doesn't fit or if the recursion fails,
        // restore the bin's capacity and try the next bin
        bins[i]  = arr[0];
    }
}

// if the element doesn't fit in any of the bins, return false
return found;

}

Any help would be great

CodePudding user response：

This is not a reduction: that would cause each thread to have it own partial result, and you want result to be global. I think that putting a critical section around two statements might work. The atomic statement is meaningless since it is not on a shared variable.

But there a deeper problem: each i iteration can write a result, which affects how far the search of the other iterations goes. That means that the outer iteration has to be sequential. (You really need to think hard about whether iterations are independent before you slap a parallel directive on them!) Maybe you can make the inner iteration parallel: it's a search, which would be a reduction on j. However that loop would have to be pretty dang long before you'd see a performance improvement.

This looks to me like the sort of algorithm that you'd have to reformulate before you can make it parallel.

CodePudding user response：

You do not need parallelization to make your code significantly faster. You have implemented the First Fit method (its complexity is O(n2)), but it can be significantly faster if you use binary search trees (O(n Log n)). To do so, you just have to use the standard library (std::multiset), in this example I have implemented the BestFit algorithm:

int binPackingSTL(const std::vector<int>& weight, const int n, const int c)
{
    std::multiset<int> bins;                        //multiset to store bins
    for (const auto x: weight) {
        const auto it=bins.lower_bound(x);        // find the best bin to accomodate x
        if(it==bins.end()){                         
          bins.insert(c - x);                       // if no suitale bin found insert a new one
        } else {
            //suitable bin found - replace it with a smaller value
            auto value=*it;         // store its value
            bins.erase(it);         // erase the old value
            bins.insert(value-x);   // insert the new value         
        }
    }
    return bins.size();             // number of bins 
}

In my measurements, it is 100x times faster than your code in the case of n=50000

EDIT: Your revised question cannot be answered because (as far as I know) there is currently no algorithm that guarantees to find the optimal solution to the bin packing problem.