Home > Back-end >  How to Find patterns forming clumps in a string?
How to Find patterns forming clumps in a string?

Time:10-15

I'm working on this programming problem. The code is supposed to return clumps like this example,

Input: CGGACTCGACAGATGTGAAGAACGACAATGTGAAGACTCGACACGACAGAGTGAAGAGAAGAGGAAACATTGTAA 5 50 4

Output: CGACA GAAGA

Here is the code that I've used:

def frequency_table(text, kmer_len):
freq_map = {}
nt = len(text)
nk = kmer_len

for i in range(0, nt-nk):
    pattern = text[i : i nk]
    if not freq_map.get(pattern):
        freq_map[pattern] = 1
    else:
        freq_map[pattern] = freq_map[pattern]   1
    
return freq_map

def FindClumps(Text, k, L, t):
Patterns = []
n = len(Text)
for i in range(n - L):
    Window = str(Text[i:L])
    freqMap = list(frequency_table(Window, k))
    for s in range(len(freqMap)):
        if len(freqMap[s]) >= t:
            Patterns.append(freqMap[s])

return Patterns

Every time I submit the answer, It's said that I'm wrong.

Is there a problem with my code? Or is there an underlying concept that I don't understand?

CodePudding user response:

I found a couple mistakes. Below is the corrected code with inline comments explaining the changes.

def frequency_table(text, kmer_len):
    freq_map = {}
    nt = len(text)
    nk = kmer_len
    
    for i in range(0, nt-nk 1): # Range short by one
        pattern = text[i : i nk]
        if not freq_map.get(pattern):
            freq_map[pattern] = 1
        else:
            freq_map[pattern] = freq_map[pattern]   1
        
    return freq_map

def FindClumps(Text, k, L, t):
    Patterns = []
    n = len(Text)
    for i in range(n - L):
        Window = str(Text[i : i L]) # End range needs to be in relation to "i"
        freqMap = frequency_table(Window, k)
        # You can interate dictionary keys. Previously, you converted the 
        # dictionary to a list (which removed your values) in order to iterate 
        # over it. Your conditional then became nonsensical.
        for s in freqMap:
            if freqMap[s] >= t:
                Patterns.append(s)
    
    return list(set(Patterns)) # limit output to unique values only


print(FindClumps("CGGACTCGACAGATGTGAAGAACGACAATGTGAAGACTCGACACGACAGAGTGAAGAGAAGAGGAAACATTGTAA", 5, 50, 4))

Output

['GAAGA', 'CGACA']
  • Related