I was going through common child problem on hackerrank. It is exactly same as a famous Longest Common Subsequence problem.
Using basic concept from CLRS, I implemented first solution. But exceeds time limit on some test cases:
def lcs1(s1, s2):
LCS_table = [[0 for _ in range(len(s2) 1)] # s2 corresponds to columns
for _ in range(len(s1) 1) # s1 corresponds to rows
]
for i in range(1,len(s1) 1):
for j in range(1,len(s2) 1):
if s1[i-1] == s2[j-1]:
LCS_table[i][j] = LCS_table[i-1][j-1] 1
else:
LCS_table[i][j] = LCS_table[i-1][j] if LCS_table[i-1][j] > LCS_table[i][j-1] else LCS_table[i][j-1]
return LCS_table[-1][-1]
So I went through the leaderboard of hackerrank and come up with second solution which passes all testcases:
def lcs2(s1,s2):
prev = cur = [0] * (len(s2) 1)
for i in range(1, len(s1) 1):
for j in range(1, len(s2) 1):
if s1[i-1] == s2[j-1]:
cur[j] = prev[j-1] 1
else:
cur[j] = prev[j] if prev[j] > cur[j-1] else cur[j-1]
prev, cur = cur, [0] * (len(s2) 1)
return prev[-1]
Q1. Is it because indexing 2D array in lcs1
is costlier than indexing 1D array in lcs2
?
I also referred this solution:
def lcs3(s1, s2):
L = [0] * len(s2)
for c in s1:
p = 0
L = [
p := (q 1 if c==d else r if p < r else p)
for d, q, r in zip(s2, [0] L, L)
]
return L[-1]
I chosen two random strings and tried these methods on them:
s1='onfjkvbibisbwbriuiwbviwovwwwwmpomqqsbambviubivbsvbwb'
s2='ieubrvbwoqpvponvnxvknoibolkncnsnbvrobrobwvkskdvksvorbqppqwqwqzcx'
print(lcs1(s1,s2)) # 18
print(lcs2(s1,s2)) # 19
print(lcs3(s1,s2)) # 18
Q2. Surprisingly the output of lcs2
differs from that of lcs1
and lcs3
. Where did I make mistake in lcs2? Still how it got accepted in hackerrank? Lack of good test cases?
CodePudding user response:
The lcs2
version tries to be more memory efficient than lcs1
is. The lcs1
version has a 2D matrix, but we can see that in one iteration of the outer loop, only 2 rows in the table are actually used: the row with index i-1
serves for updating row with index i
. That means that for a given value if i
, the rows with indices less than i-1
are no longer used in the rest of algorithm. That is space that could have been freed. Only two rows matter during the whole process.
And that is what lcs2
tried to do. It gives the row with index i-1
the name prev
, and the row with index i
it names curr
. Then in the next iteration, prev
will be what curr
was in the previous iteration, and curr
gets a new list to work with which is only allocated when it is now needed. The previous list that prev
referenced is then no longer accessible and can be garbage collected.
So far the good intentions of lcs2
. But as you found out, it has a problem. Maybe someone broke the original code of lcs2
, thinking it could be simplified a bit. Here is the issue:
prev = cur = [0] * (len(s2) 1)
This creates one list, not two. And both prev
and curr
reference it. That is a problem, because in the first iteration every mutation of the curr
list, will be a mutation in the prev
list, because they are the same list! This is not what is supposed to happen.
If you change this piece of initialisation to this:
prev = [0] * (len(s2) 1)
cur = [0] * (len(s2) 1)
...it will work fine, and return the same results as lcs1
and lcs3
.