Home > Back-end >  Recursion - Longest Common Subsequence with Restriction on N number of substrings
Recursion - Longest Common Subsequence with Restriction on N number of substrings

Time:10-16

I am really stuck on a question that I am given a String input S, another String input U that represent the entire substring and a non-negative int number m. The method is lcs(S, U, m).

For example: Let S="aabacaab", U="aab", and m=2. Then we have the following substring combinations:

[a][ab]acaab

[a]abaca[ab]

a[a]baca[ab]

aab[a]ca[ab]

aabac[a][ab]

[aa][b]acaab

[aa]bacaa[b]

aabac[aa][b]

So lcs(S, U, m) returns 8 as we have 8 different combinations of substrings of U with the restriction of m number of substrings.

My initial thinking is to compare the first character of S with the first character of U and recurse down by shortening both S and U. However, due to m, I am unable to determine how m should be reduced as the shortened S and U don't have any reference to the previous state.

CodePudding user response:

When dealing with dynamic programming on strings, the answer to "What should my subproblems be" is often prefixes, suffixes, or substrings.

From your final note, you've correctly identified that we can solve the original problem by solving the problem on suffixes. We also know that our subproblem has to know the number of pieces we have left to use. At this point, we can guess what the subproblems are and try to define a formula.

We can let f(i,j,k) be the number of ways to match the last j letters of U using the last i letters of S into exactly k substrings. What are the edge cases? If j and k are both 0, we have nothing to do; there's 1 way to do nothing. If i<j we cannot match the j letters, and if j<k we cannot split the j letters into enough pieces.

Finally, you have to define the main recursion. First, we can skip this letter of S, which gives f(i-1,j,k) as a summand. Now, let Match-Length(i,j) be the number of consecutive places at the start of S[-i:] that match U[-j:]. We need to add on all the possible matches: for every length l, 1 <= l <= Match-Length(i,j), we have to add f(i-l, j-l, k-1). All that's left is to check that this covers all cases.

CodePudding user response:

To shorten m, you will have to look at the current letters in S and U. If both letters are the same you have two considers two (non exclusive) situations:

  • you are currently matching a sub-string of S (which means, using your notations, that you have opened a string, like [aa...) in which case you can continue that matching

  • you still have a strictly positive number of sub-strings to match in which case you can start a new matching (whatever you are currently matching a string or not)

For that, you will need an additional parameter for your recursion which specifies if you are currently matching a sub-string.

The base case indicating that you have found a match to your problem is when you have reached the end of U and m is equal to 0.

Here is the code that implements this solution (note that the function handles list of letters rather than strings):

#!/usr/bin/env python3

import functools

def lcs(s, u, m) -> int:

    # recursive function
    #   i: index in list s
    #   j: index in list j
    #   m: number of sub-strings to find
    #   match: are we currently matching a sub-string or not?
    @functools.lru_cache(maxsize=10000)  #  cache the result of our function
    def loop(i, j, m, match) -> int:
        # base cases
        if j >= len(u):  # end of u reached => no more letter to look for
            if m == 0: # check if have found our m sub-strings
                return 1
            return 0
        if i >= len(s):  # end of s reached and still letter to look for 
            return 0

        result = loop(i   1, j, m, False)  # skip the current letter

        # letter in s and u are the same (we will move both i and j)
        if s[i] == u[j]:
            if match:  # we are currently matching a sub-string => continue
                result  = loop(i   1, j   1, m, True)
            if m > 0:  # we can start a new sub-string
                result  = loop(i   1, j   1, m - 1, True)
        return result

    return loop(0, 0, m, False)

print(lcs(list("aabacaab"), list("aab"), 2))
  • Related