Home > Mobile >  Encode the DNA string in such a way that similar subsequent characters are grouped into number of oc
Encode the DNA string in such a way that similar subsequent characters are grouped into number of oc

Time:11-01

I need help in writing the Python code which would return the following output_string as mentioned below in the examples.

Example 1:

input_string = "AAABCCCCDDA"
output_string = "3AB4C2DA"

Example 2:

input_string = "ABBBBCCDDDDAAAAA"
output_string = "A4B2C4D5A"

CodePudding user response:

You can use itertools.groupby.

In python 3.8 , You can use walrus operator (:=) and write a short approach.

>>> from itertools import groupby
>>> input_string = "ABBBBCCDDDDAAAAA"
>>> ''.join(f"{len_g}{k}" if (len_g := len(list(g))) > 1 else k for k, g in groupby(input_string))
'A4B2C4D5A'

In Python < 3.8:

from itertools import groupby

input_string = "AAABCCCCDDA"

st = ''
for k, g in groupby(input_string):
    len_g = len(list(g))
    if len_g>1:
        st  = f"{len_g}{k}"
    else:
        st  = k
         
print(st)

Output:'3AB4C2DA'

CodePudding user response:

it seems like regex also can do the trick:

from re import sub

dna = "AAABCCCCDDA"
sub(r'(\w)\1 ',lambda m: str(len(m[0])) m[1],dna)  # '3AB4C2DA'
  • Related