Problem statement

The algorithm receives a list of plank lengths [ L0, ..., L_n-1 ] and a list of desired cut pieces lengths [M0, ..., M_m-1]. It returns a vector [P0, ..., P_m-1] where p_j is the index of the plank where the piece j is cut from. P_j = -1 if that piece isn't cut.

The greedy algorithm is:

Choose the biggest piece j and the longest plank i;
Remove the piece j from the list of desired cut pieces.
If M_j > L_i, then piece j isn't cut. So we set P_j = -1 and return to step 1.
We cut the piece j from the plank i so its length becomes L_i - M_j. We set P_j = i;
If there's pieces left to cut, go back to step 1.

The problem is to implement this greedy algorithm in O(m log n) time using a heap.

It's given that m >= n.

Question

So my question is, using a heap and seeing that we need the biggest piece j and the longest plank i, something tells me that we need to sort the vectors hence I don't see how we can achieve this algorithm in O(m log n).

Could someone enlighten me on how this can be done in O(m log n) time?

I think the use of the heap can be 1 heap for the planks and 1 for the pieces or only 1 heap but there must be at least one heap.

CodePudding user response：

If you restrict yourself to a comparison based sort then I don't think you can do any better than O(m log m) (which is worse than O(m log n)). This is because the first step ("choosing biggest piece j") effectively produces a sorted sequence of pieces of length m -- as you pointed out correctly.

However, you can do this in O(m log n) if you use a linear-time sort (e.g. radix sort) for the pieces M_j, and a heap for the planks L_i. The sort of the pieces would take O(m). You will do m push/pop operations on the heap, whose size is bound by the number of planks n. Thus the operations on that heap will total to O(m log n). Together with the previous sorting that's still O(m log n).

CodePudding user response：

This straightforward implementation is O(m log m). This is unavoidable since the problem requires to process cuts from large to small. This requires sorting m elements, which is O(m log m). There is no faster sorting algorithm.

typedef std::pair<size_t,size_t> pss;
std::vector<int> greedy(const std::vector<size_t>& L, const std::vector<size_t>& M)
{
    std::vector<int> result(M.size(), -1);
    // heap of pairs that sorts on pair.first, top==largest element
    std::priority_queue<pss> Lheap, Mheap;
    // create heaps of value => index
    for (size_t i = 0; i < L.size();   i)
        Lheap.push(pss{L[i],i});
    for (size_t i = 0; i < M.size();   i)
        Mheap.push(pss{M[i],i});
    // process cuts from largest to smallest
    while (!Mheap.empty() && !Lheap.empty())
    {
        // take longest plank & cut
        auto mi = Mheap.top();
        auto li = Lheap.top();
        Mheap.pop();
        // check if cut is too big
        if (mi.first > li.first)
            continue;
        // write plank index to cut index
        result[mi.second] = li.second;
        // update heap: remove old plank, reinsert with new length
        Lheap.pop();
        li.first -= mi.first;
        if (li.first > 0)
            Lheap.push(li);
    }
    return result;
}

ONLY IF at least the M input vector is presorted then of course the O(m log m) cost is moved to the precondition. In that case you only need the Lheap, removing Mheap and its O(m log m) cost, you simply iterate over M. Thus you end up with O(m log n): m loop iterations with Lheap operations of cost O(log n).

CodePudding user response：

Build max heaps out of the n plank lengths and m cut lengths in O(n m) time (you can build a heap of k elements in O(k) time if you build it all at once rather than inserting elements one at a time, and you have all elements up front as inputs to the algorithm, which is why building the heaps can be done in linear time).

Now, using standard heap operations, you can remove a cut length from the cut-length heap, MH, in O(logm) time, and you can remove a plank length from the plank-length heap, LH, in O(logn) time. Also, you can decrease a plank-length value by removing it and then re-inserting the lesser value in O(logn logn) = O(2logn) = O(logn) time.

One note: in these heaps, you'll want to store indexes into the original arrays as well as lengths. Your heaps will be structured by length (longer lengths on top), but the individual elements of the heap need to be structs/pairs/objects which store a cut/plank lengths alongside an index. If you use a cpp pair, you can put lengths first and indexes second and the heap will be ordered correctly (as a max-heap sorted according to the first pair element) by default.

Revisiting the steps of the algorithm, it looks like that the best complexity you can achieve with a heap-based approach is O(mlogm) time:

Peek the top element of LH = {L_j, j} and MH = {M_i, i}: O(1)
Pop {M_i, i} from MH: O(logm)
Check if the highest cut length, M_i is greater than the highest plank length, L_j. If M_i > L_j, we can't cut that length. Set P[j] = -1 and return to step 1: O(1)
Cut length M_i from plank L_j by popping LH (O(logn)) and inserting {L_j - M_i, j} back into it (also O(logn)). Set P[j] = i (O(1)): O(logn)
If there's pieces left to cut (if !MH.empty() && !LH.empty()), go back to step 1: O(1)

Every execution of the loop, we do log(m) log(n) O(1) = O(logm) work. The loop runs at most max(n,m) = m times. So the overall complexity is O(n m mlogm) = O(mlogm). You can't do better than this unless M is pre-sorted (in which case you wouldn't need MH, and could instead just keep an index into M and the O(logm) step 2 could be replaced by O(1) {M_i, i} = M[mIdx ], reducing the overall complexity to O(m*logn)).

It looks like another answer already posted an implementation this algorithm, but I'll still add my answer as complexity analysis.