Home > Mobile >  Calculate the number of fixed-length strings that contain a given word exactly once
Calculate the number of fixed-length strings that contain a given word exactly once

Time:10-09

For example given the fixed-length N=10 and the word W="radar". Also we know we have 26 characters.

Then we can "easily" calculate the number of strings as:

radar _ _ _ _ _ -> 26^5-26-1 (because radarradar and radaradar_ is not ok)

_ radar _ _ _ _ -> 26^5-26

_ _ radar _ _ _ -> 26^5

_ _ _ radar _ _ -> 26^5

_ _ _ _ radar _ -> 26^5-26

_ _ _ _ _ radar -> 26^5-26-1

Which is 71288150. Is there a better solution to calculate this? Even, if N can be very large.

CodePudding user response:

This answer is largely based on the DFA for radar

To get the answer, we need to count the frequencies of being in each possible state (starting from 0) after seeing length = 10 letters-- the good states are n, n 1 ... 2n-1, which correspond to exactly one full match. To generate the transition matrix, we can use the prefix function from the KMP algorithm, with almost no modifications. The definition of the DFA for the first row is as follows:

For state 0 <= i < n,
DFA[i][letter] = Length of longest suffix of (pattern[:i] letter)
                 that is also a prefix of pattern

with the caveat that a full match in state n-1 moves us to row 1. The definition for states n <= i < 2n is the same, just with every state shifted up by n.

Using numpy, we can get an answer using matrix powers fairly quickly: we need to raise a roughly 2nx2n matrix to the power of our fixed length. If length is too large, you'll either need to use Python's big integers or take moduli to avoid overflow, once the answer gets too large. However, matrix exponentiation can be done with repeated squaring for a time complexity of O(n^3 * log(length)) using naive matrix multiplication.

def kmp_count(pattern: str, length: int) -> int:
    if not pattern.islower():
        raise ValueError("Pattern must be lowercase")

    n = len(pattern)

    if n > length:
        return 0
    if n == length:
        return 1
    if n == 1:
        return length * pow(25, length - 1)

    pattern_chars = [ord(x) - ord('a') for x in pattern]

    # Create the DFA
    dfa = [[0 for _ in range(26)] for _ in range(2 * n   1)]

    # Final failure state is an absorbing state
    dfa[2 * n] = [2 * n for _ in range(26)]

    dfa[0][pattern_chars[0]] = 1
    restart_state = 0
    for i in range(1, n):
        dfa[i] = dfa[restart_state][:]

        dfa[i][pattern_chars[i]] = i   1
        restart_state = dfa[restart_state][pattern_chars[i]]

    dfa[n - 1][pattern_chars[n - 1]]  = restart_state

    for i in range(n, 2 * n):
        dfa[i] = [x   n for x in dfa[i - n]]

    # Two or more matches moves us to fail state
    dfa[2 * n - 1][pattern_chars[n - 1]] = 2 * n

    transitions = np.zeros((2 * n   1, 2 * n   1), dtype=np.int64)
    for i, x in enumerate(dfa):
        for y in x:
            transitions[i][y]  = 1

    final_transitions = np.linalg.matrix_power(transitions, length)
    return final_transitions[0, n:2 * n].sum()
print(kmp_count(pattern='radar', length=10))
>>> 71288150
  • Related