python program that count modified binary digits by using regular expression-CodePudding

I want to write a program in python, by using regular expression, that can count n numbers of digits (modified binary numbers) from a file that contain binary number for example want to count 5 digits numbers which start from 1 and end with 0, so the number will be 10000, 10010, 10100, 10110, 11000, 11010, 11100, 11110, (this is modified binary numbers) for example if I want to count 4 digits binary number which is start with 1 and end with 1, what I am doing is (for example/to show you, instead of file I am using a binary string)


a_string = '011010010111001101101111011011010110110101110011010000110010010111000100100110110101101111011011110111011001101100011011010111011001101000011001001101100011100010010110110011111011001110001001011011'

s_0 = a_string.count('1000')
s_1 = a_string.count('1010')
s_2 = a_string.count('1100')
s_3 = a_string.count('1110')


print(1000, s_0, '\n', 1010, s_1, '\n', 1100, s_2, '\n', 1110, s_3)

result =

1000 = 7, 1010 = 7, 1100 = 13, 1110 = 11. Please note, want to count each binary number separately

CodePudding user response：

You might use a pattern and create a dictionary.

(?=(1[01]{2}0))

(?= Positive lookahead, assert to the right of the current position
- (1[01]{2}0) Capture group 1 (which will be returned by re.findall), match 1, then 2 times either 0 or 1 and match a 0 at the end
) Close lookahead

Regex demo | Python demo

import re

pattern = r"(?=(1[01]{2}0))"
s = "011010010111001101101111011011010110110101110011010000110010010111000100100110110101101111011011110111011001101100011011010111011001101000011001001101100011100010010110110011111011001110001001011011"

cnt = dict()
for i in re.findall(pattern, s):
    # get the value by key, or return 0 if the key does not exist
    cnt[i] = cnt.get(i, 0)   1

print(cnt)

Output

{'1010': 7, '1110': 11, '1100': 13, '1000': 7}

For a pattern with 5 digits that starts with 1 and ends with 0:

pattern = r"(?=(1[01]{3}0))"

The output will be:

{'11010': 7, '10100': 3, '10010': 7, '11100': 5, '10110': 17, '11110': 4, '10000': 2, '11000': 5}

See another Python demo

CodePudding user response：

With your method you are including overlapping sequences in the total count. For instance, a_string[9:13] and a_string[10:14] both contain a 4-digit sequence starting with 1 and ending with 0.

A regex may be useful if you wanted to exclude overlaps:

#this will output 26, while the single count() calls would sum up 38
pat=r'1\d{2}0'
len(re.findall(pat,a_string))