Home > Net >  Group string by similar substrings
Group string by similar substrings

Time:05-31

I'm trying to take a number, convert it into binary, and then produce a list of the length of like terms.

For example, n=49 is represented in binary as "110001". I would like to return a list of the length of ["11", "000", "1"], which is [2, 3, 1].

So far I'm able to convert a number into binary using "{0:b}".format(n) but I cannot group the like terms.

CodePudding user response:

You might use itertools.groupby for this task as follows

import itertools
t = "110001"
lengths = [len(list(g)) for k,g in itertools.groupby(t)]
print(lengths)

output

[2, 3, 1]

itertools.groupby does found runs of consecutive same elements (unless 2nd argument is given), g here is iterator which is used to create list which length is measured. Observe that you can not do len(g).

CodePudding user response:

You can do it like this:

from itertools import groupby

def binary_chunks(n: int) -> list:
    return [len("".join(g)) for _, g in groupby(f"{n:b}")]

For example:

>>> binary_chunks(49) # binary: 110001
>>> [2, 3, 1]

>>> binary_chunks(4) # binary: 100
>>> [1, 2]

>>> binary_chunks(123456789) # binary: 111010110111100110100010101
>>> [3, 1, 1, 1, 2, 1, 4, 2, 2, 1, 1, 3, 1, 1, 1, 1, 1]

I also ran a timeit test if you're interested:

>>> %timeit binary_chunks(49)
>>> 2.28 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

CodePudding user response:

You ca use this after converting your number (careful, it should be in string):

from itertools import groupby

bin_number = "110001"
result = ["".join(group) for ele, group in groupby(bin_number)]
print(result)

output:

["11", "000", "1"]

CodePudding user response:

You can also solve this problem with the re package:

import re
def bit_run_lengths(n, pat=re.compile('1 |0 ')):
  return [
           len(m)
           for m in pat.findall(bin(n)[2:])
         ]

According to my benchmarking, that's about twice as fast as the itertools.groupby solution.

For a more general (single-character) run-length converter, you can use a back-reference. (This is the rare case in which back references are safe to use.) It's slightly slower, but it still beats groupby by a bit:

def run_length(s, pat=re.compile(r'((.)\2*)')):
  return [len(m[0]) for m in pat.findall(s)]
>>> run_length("aabbbbbbbzxxaaa")
[('a', 2), ('b', 7), ('z', 1), ('x', 2), ('a', 3)]
  • Related