Number of Palindromic Subsequences of length 5 in Binary String-CodePudding

Given a string s, which consists of characters "0" and "1" only, return the number of subsequences of length 5, which are palindromes.

As the number can be really big, return the answer mod(10^9 7)

Test case 1:

input: "0100110"

output: 5

indices(1,2,3,6,7) -> "01010"

indices(1,2,3,5,7) -> "01010"

indices(1,2,4,6,7) -> "01010"

indices(1,2,4,5,7) -> "01010"

indices(1,2,5,6,7) -> "01110"

5 mod(10^9  7) = 5

Test case 2:

input: "010110"

output: 3

indices(1,2,3,4,6) -> "01010"

indices(1,2,3,5,6) -> "01010"

indices(1,2,4,5,7) -> "01110"

3 mod(10^9  7) = 3

Test case 3

input: "01111"

output: 0

There is no palindromic subsequence of length 5

I have been trying to solve the challenge above in python. I tried a dynamic programming approach for counting palindromic subsequences which is quadratic O(n^2) where n is length of string, but I keep getting time limit exceeded. I think there is a more optimised solution that takes advantage of the characters being just "0" and "1" but I cant figure it out (I might be wrong).

Can anyone help me with a more optimised solution?

Any input is highly appreciated, thanks.

Here is the code I tried

def countSubSequences(s):
    output = 0
    n = len(s)
    dp = [[0]*n]*n

    for i in range(n-2,-1,-1):
        for j in range(i 2,n):
            if dp[i 1][j] == dp[i 1][j-1]:
                dp[i][j] = dp[i][j-1]   0
            else:
                dp[i][j] = dp[i][j-1]   (dp[i 1][j] - dp[i 1][j-1])
            
            if s[i] == s[j]:
                dp[i][j]  = j-i-1

    for i in range(n):
        for j in range(i 4,n):
            if s[i] == s[j]:
                output  = dp[i 1][j-1]

    return output % ((10**9)   7)

CodePudding user response：

Keeping track of counts of all subsequences up to five characters, takes about one second for a random string of length 100,000. Code including testing (Try it online!):

import random
from itertools import combinations
from math import comb
from time import time
from collections import Counter

mod = 10**9   7

def countSubSequences(s):
    ctr = Counter([''])
    fives = 0
    for c in s:
        for ss, cnt in list(ctr.items()):
            ss  = c
            if len(ss) < 5:
                ctr[ss]  = cnt
            elif ss == ss[::-1]:
                fives  = cnt
    return fives % mod

for _ in range(10):
    s = ''.join(random.choices('01', k=30))
    expect = sum(c == c[::-1] for c in combinations(s, 5)) % mod
    result = countSubSequences(s)
    print(result == expect, expect, result)

n = 100000
print(comb(n, 5) % mod)
print(countSubSequences('0' * n))

t = time()
print(countSubSequences(''.join(random.choices('01', k=n))), time() - t)

Sample output:

True 33636 33636
True 27591 27591
True 34409 34409
True 31441 31441
True 35551 35551
True 29305 29305
True 34632 34632
True 34683 34683
True 54957 54957
True 33009 33009
502061291
502061291
595650700 1.2116138935089111

Optimized version that's about 20 times faster, can even solve length 1,000,000 in less than a second (Try it online!). This combines equivalent subsequences, for example Oyxy keeps track of the number of all subsequences of length 4 which start with 0 and their second and fourth character are the same.

def countSubSequences(s):
    O = I = OO = OI = IO = II = OOx = OIx = IOx = IIx = Oyxy = Iyxy = zyxyz = 0
    for c in s:
        if c == '0':
            zyxyz  = Oyxy
            Iyxy  = IOx
            Oyxy  = OOx
            IIx  = II
            IOx  = IO
            OIx  = OI
            OOx  = OO
            IO  = I
            OO  = O
            O  = 1
        else:
            zyxyz  = Iyxy
            Iyxy  = IIx
            Oyxy  = OIx
            IIx  = II
            IOx  = IO
            OIx  = OI
            OOx  = OO
            II  = I
            OI  = O
            I  = 1
    return zyxyz % mod

CodePudding user response：

I was recently asked this question in Goldman Sachs interview.

There are only 8 possible binary substrings of length 5 which are ["00000", "01010", "10001", "11011", "00100", "01110", "10101", "11111"]

Now there exists a very simple Dynamic Programming solution to the question: Find number of times a particular string occurs as a subsequence in another string. The code for this problem can be found at https://www.geeksforgeeks.org/find-number-times-string-occurs-given-string/

You just have to run this code for the given binary input string and each binary string of length 5 declared previously, and sum up their results to get final answer.