Trying to understand the time complexity of this dynamic recursive subset sum-CodePudding

# Returns true if there exists a subsequence of `A[0…n]` with the given sum
def subsetSum(A, n, k, lookup):
 
    # return true if the sum becomes 0 (subset found)
    if k == 0:
        return True
 
    # base case: no items left, or sum becomes negative
    if n < 0 or k < 0:
        return False
 
    # construct a unique key from dynamic elements of the input
    key = (n, k)
 
    # if the subproblem is seen for the first time, solve it and
    # store its result in a dictionary
    if key not in lookup:
 
        # Case 1. Include the current item `A[n]` in the subset and recur
        # for the remaining items `n-1` with the decreased total `k-A[n]`
        include = subsetSum(A, n - 1, k - A[n], lookup)
 
        # Case 2. Exclude the current item `A[n]` from the subset and recur for
        # the remaining items `n-1`
        exclude = subsetSum(A, n - 1, k, lookup)
 
        # assign true if we get subset by including or excluding the current item
        lookup[key] = include or exclude
 
    # return solution to the current subproblem
    return lookup[key]
 
 
if __name__ == '__main__':
 
    # Input: a set of items and a sum
    A = [7, 3, 2, 5, 8]
    k = 14
 
    # create a dictionary to store solutions to subproblems
    lookup = {}
 
    if subsetSum(A, len(A) - 1, k, lookup):
        print('Subsequence with the given sum exists')
    else:
        print('Subsequence with the given sum does not exist')

It is said that the complexity of this algorithm is O(n * sum), but I can't understand how or why; can someone help me? Could be a wordy explanation or a recurrence relation, anything is fine

CodePudding user response：

The simplest explanation I can give is to realize that when lookup[(n, k)] has a value, it is True or False and indicates whether some subset of A[:n 1] sums to k.

Imagine a naive algorithm that just fills in all the elements of lookup row by row.

lookup[(0, i)] (for 0 ≤ i ≤ total) has just two elements true, i = A[0] and i = 0, and all the other elements are false.

lookup[(1, i)] (for 0 ≤ i ≤ total) is true if lookup[(0, i)] is true or i ≥ A[1] and lookup[(0, i - A[1]) is true. I can reach the sum i either by using A[i] or not, and I've already calculated both of those.

... lookup[(r, i)] (for 0 ≤ i ≤ total) is true if lookup[(r - 1, i)] is true or i ≥ A[r] and lookup[(r - 1, i - A[r]) is true.

Filling in this table this way, it is clear that we can completely fill the lookup table for rows 0 ≤ row < len(A) in time len(A) * total since filling in each element in linear. And our final answer is just checking if (len(A) - 1, sum) True in the table.

Your program is doing the exact same thing, but calculating the value of entries of lookup as they are needed.

CodePudding user response：

Sorry for submitting two answers. I think I came up with a slightly simpler explanation.

Take your code in imagine putting the three lines inside if key not in lookup: into a separate function, calculateLookup(A, n, k, lookup). I'm going to call "the cost of calling calculateLookup for n and k for a specific value of n and k to be the total time spent in the call to calculateLookup(A, n, k, loopup), but excluding any recursive calls to calculateLookup.

The key insight is that as defined above, the cost of calling calculateLookup() for any n and k is O(1). Since we are excluding recursive calls in the cost, and there are no for loops, the cost of calculateLookup is the cost of just executing a few tests.

The entire algorithm does a fixed amount of work, calls calculateLookup, and then a small amount of work. Hence the amount of time spent in our code is the same as asking how many times do we call calculateLookup?

Now we're back to previous answer. Because of the lookup table, every call to calculateLookup is called with a different value for (n, k). We also know that we check the bounds of n and k before each call to calculateLookup so 1 ≤ k ≤ sum and 0 ≤ n ≤ len(A). So calculateLookup is called at most (len(A) * sum) times.

In general, for these algorithms that use memoization/cacheing, the easiest thing to do is to separately calculate and then sum:

How long things take assuming all values you need are cached.
How long it takes to fill the cache.

The algorithm you presented is just filling up the lookup cache. It's doing it in an unusual order, and its not filling every entry in the table, but that's all its doing.

The code would be slightly faster with

lookup[key] =  subsetSum(A, n - 1, k - A[n], lookup) or subsetSum(A, n - 1, k, lookup)

Doesn't change the O() of the code in the worst case, but can avoid some unnecessary calculations.