Comparing various python implementations of solution to longest common subsequence puzzle-CodePudding

I was going through common child problem on hackerrank. It is exactly same as a famous Longest Common Subsequence problem.

Using basic concept from CLRS, I implemented first solution. But exceeds time limit on some test cases:

def lcs1(s1, s2):
    LCS_table = [[0 for _ in range(len(s2) 1)] # s2 corresponds to columns
                 for _ in range(len(s1) 1) # s1 corresponds to rows
                ]              
    for i in range(1,len(s1) 1): 
        for j in range(1,len(s2) 1): 
            if s1[i-1] == s2[j-1]: 
                LCS_table[i][j] = LCS_table[i-1][j-1]   1
            else:
                LCS_table[i][j] = LCS_table[i-1][j] if LCS_table[i-1][j] > LCS_table[i][j-1] else LCS_table[i][j-1]
    return LCS_table[-1][-1]

So I went through the leaderboard of hackerrank and come up with second solution which passes all testcases:

def lcs2(s1,s2):
    prev = cur = [0] * (len(s2) 1) 
    for i in range(1, len(s1) 1):
        for j in range(1, len(s2) 1):
            if s1[i-1] == s2[j-1]:
                cur[j] = prev[j-1] 1
            else:
                cur[j] = prev[j] if prev[j] > cur[j-1] else cur[j-1]
        prev, cur = cur, [0] * (len(s2) 1)
    return prev[-1]

Q1. Is it because indexing 2D array in lcs1 is costlier than indexing 1D array in lcs2?

I also referred this solution:

def lcs3(s1, s2):
    L = [0] * len(s2)
    for c in s1:
        p = 0
        L = [
            p := (q 1 if c==d else r if p < r else p)
            for d, q, r in zip(s2, [0] L, L)
        ]
    return L[-1]

I chosen two random strings and tried these methods on them:

s1='onfjkvbibisbwbriuiwbviwovwwwwmpomqqsbambviubivbsvbwb'
s2='ieubrvbwoqpvponvnxvknoibolkncnsnbvrobrobwvkskdvksvorbqppqwqwqzcx'
print(lcs1(s1,s2)) # 18
print(lcs2(s1,s2)) # 19
print(lcs3(s1,s2)) # 18

Q2. Surprisingly the output of lcs2 differs from that of lcs1 and lcs3. Where did I make mistake in lcs2? Still how it got accepted in hackerrank? Lack of good test cases?

CodePudding user response：

The lcs2 version tries to be more memory efficient than lcs1 is. The lcs1 version has a 2D matrix, but we can see that in one iteration of the outer loop, only 2 rows in the table are actually used: the row with index i-1 serves for updating row with index i. That means that for a given value if i, the rows with indices less than i-1 are no longer used in the rest of algorithm. That is space that could have been freed. Only two rows matter during the whole process.

And that is what lcs2 tried to do. It gives the row with index i-1 the name prev, and the row with index i it names curr. Then in the next iteration, prev will be what curr was in the previous iteration, and curr gets a new list to work with which is only allocated when it is now needed. The previous list that prev referenced is then no longer accessible and can be garbage collected.

So far the good intentions of lcs2. But as you found out, it has a problem. Maybe someone broke the original code of lcs2, thinking it could be simplified a bit. Here is the issue:

    prev = cur = [0] * (len(s2) 1)

This creates one list, not two. And both prev and curr reference it. That is a problem, because in the first iteration every mutation of the curr list, will be a mutation in the prev list, because they are the same list! This is not what is supposed to happen.

If you change this piece of initialisation to this:

    prev = [0] * (len(s2) 1) 
    cur = [0] * (len(s2) 1)

...it will work fine, and return the same results as lcs1 and lcs3.