Counting the number of ways to make up a string-CodePudding

I have just started learning dynamic programming and was able to do some of the basic problems, such as fibbonaci, the knapsack and a few more problems. Coming across the problem below, I got stuck and do not know how to proceed forward. What confuses me is what would be the base case in this case, and the overlapping problems. Not knowing this prevents me from developing a relation. They are not as apparent in this example as they were in the previous ones I have solved thus far.

Suppose we are given some string origString, a string toMatch and some number maxNum greater than or equal to 0. How can we count in how many ways it is possible to take maxNum number of nonempty and nonoverlapping substrings of the string origString to make up the string toMatch?

Example:

If origString = "ppkpke", and toMatch = "ppke"

maxNum = 1: countWays("ppkpke", "ppke", 1) will give 0 because toMatch is not a substring of origString.

maxNum = 2: countWays("ppkpke", "ppke", 2) will give 4 because 4 different combinations of 2 substring made up of "ppkpke" can make "ppke". Those strings are "ppk" & "e", "pp" & "ke" , "p" & "pke" (excluding "p") and "p" & "pke" (excluding "k")

CodePudding user response：

Here is a recursive solution.

Compares the first character of source and target, and if they're equal, choose to either take it (advancing by 1 char in both strings) or not take it (advancing by 1 char in source but not in target). The value of k is decremented everytime a new substring is created; there is an additional variable continued which is True if we're in the middle of building a substring, and False otherwise.

def countWays(source, target, k, continued=False):
  if len(target) == 0:
    return (k == 0)
  elif (k == 0 and not continued) or len(source) == 0:
    return 0
  elif source[0] == target[0]:
    if continued:
      return countWays(source[1:], target[1:], k, True)   countWays(source[1:], target[1:], k-1, True)   countWays(source[1:], target, k, False)
    else:
      return countWays(source[1:], target[1:], k-1, True)   countWays(source[1:], target, k, False)
  else:
    return countWays(source[1:], target, k, False)

print(countWays('ppkpke', 'ppke', 1))
# 0
print(countWays('ppkpke', 'ppke', 2))
# 4
print(countWays('ppkpke', 'ppke', 3))
# 8
print(countWays('ppkpke', 'ppke', 4))
# 4
print(countWays('ppkpke', 'ppke', 5))
# 0

CodePudding user response：

As an initial word of caution, I’d say that although my solution happens to match the expected output for the tiny test set, it is very likely wrong. It’s up to you to double-check it on other examples you may have etc.

The algorithm walks the longer string and tries to spread the shorter string over it. The incremental state of the algorithm consists of tuples of 3 elements:

long string coordinate i (origString[i] == toMatch[j])
short string coordinate j (origString[i] == toMatch[j])
number of ways we made it into that^^^ state

Then we just walk along the strings over and over again, using stored, previously discovered state, and sum up the total number(s) of ways each state was achieved — in the typical dynamic programming fashion.

For a state to count as a solution, j must be at the end of the short string and the number of iterations of the dynamic algorithm must be equivalent to the number of substrings we wanted at that point (because each iteration added one substring).

It is not entirely clear to me from the assignment whether maxNum actually means something like “exactNum”, i.e. exactly that many substrings, or whether we should sum across all lower or equal numbers of substrings. So the function returns a dictionary like { #substrings : #decompositions }, so that the output can be adjusted as needed.

#!/usr/bin/env python

def countWays(origString, toMatch, maxNum):
  origLen = len(origString)
  matchLen = len(toMatch)
  state = {}
  for i in range(origLen):
    for j in range(matchLen):
      o = i   j
      if origString[o] != toMatch[j]:
        break
      state[(o, j)] = 1
  sums = {}
  for n in range(1, maxNum):
    if not state:
      break
    nextState = {}
    for istart, jstart in state:
      prev = state[(istart, jstart)]
      for i in range(istart   1, origLen):
        for j in range(jstart   1, matchLen):
          o = i   j - jstart - 1
          if origString[o] != toMatch[j]:
            break
          nextState[(o, j)] = prev   nextState.get((o, j), 0)
    sums[n] = sum(state[(i, j)] for i, j in state if j == matchLen - 1)
    state = nextState
  sums[maxNum] = sum(state[(i, j)] for i, j in state if j == matchLen - 1)
  return sums

result = countWays(origString='ppkpke', toMatch='ppke', maxNum=5)

print('for an exact number of substrings:', result)
print(' for up to a number of substrings:', {
  n: s for n, s in ((m, sum(result[k] for k in range(1, m   1)))
                    for m in range(1, 1   max(result.keys())))})

This^^^ code is a quick and ugly hack and nothing more. There is a huge room for improvement, including (but not limited to) the use of generator functions (yield), the use of @memoize etc. Here’s some output:

for an exact number of substrings: {1: 0, 2: 4, 3: 8, 4: 4, 5: 0}
 for up to a number of substrings: {1: 0, 2: 4, 3: 12, 4: 16, 5: 16}

It would be an interesting (and nicely challenging) exercise to store a bit more of the dynamic state (e.g. to keep it for each n) and then reconstruct and pretty-print (efficiently) the exact string (de)compositions that were counted.