Home > OS >  Reading a sequence in pairs and increase the base by 1
Reading a sequence in pairs and increase the base by 1

Time:03-31

I want to read a sequence Amino Acid Sequence ("ACDEFGHIKL") in a pair of predefined motif length lets say 3 and print it. output will be [ACD,EFG,HIK]. but next time I want to increase its base by 1 so next output should be [CDE,FGH,IKL].

I wrote the following python code which works absolutely fine. I just want to explore if there is any other option to write it to make it simple.

motif_len=int(motif_len)

if len(AA_seq)>=motif_len:
    for i in range(len(AA_seq)-motif_len 1):        
        
        a=i
        b=i motif_len
        # print(a,b)
        print(AA_seq[a:b])

Any comment or suggestion will be appreciated. I was wondering if Python has any prebuild library for this kind of function. Thanks

CodePudding user response:

If I've understood correctly, you are implementing what is known as a 'sliding window' or 'rolling window'.

I was looking into this recently and came across the following thread:

Rolling or sliding window iterator?

This article may also be of interest:

https://medium.com/geekculture/implement-a-sliding-window-using-python-31d1481842a7

My conclusion was that there's no obvious inbuilt function to call for this one and that the simplest implementation is basically the one you have already worked out for yourself!

CodePudding user response:

i would go this way -

group_len = 4
AA_seq = "ACDEFGHIKL"
print([AA_seq[i: i group_len] for i in range(len(AA_seq) - group_len   1) if len(AA_seq) >= group_len ])

which for this specific case would result in:

['ACDE', 'CDEF', 'DEFG', 'EFGH', 'FGHI', 'GHIK', 'HIKL']

CodePudding user response:

You can use the regex library to get the list of blocks:

import re
re.findall('...','ACDEFGHIKL')

Another option is the textwrap library:

from textwrap import wrap
wrap('ACDEFGHIKL', 3)

To complete, iterate through the substrings:

s_cur = s

for i in range(len(s)):
    print(get_blocks(s_cur))
    s_cur = s_cur[1:]

Where get_blocks is a function that uses one of the two methods above.

  • Related