I want to write a program in python, by using regular expression, that can count n numbers of digits (modified binary numbers) from a file that contain binary number for example want to count 5 digits numbers which start from 1 and end with 0, so the number will be 10000, 10010, 10100, 10110, 11000, 11010, 11100, 11110, (this is modified binary numbers) for example if I want to count 4 digits binary number which is start with 1 and end with 1, what I am doing is (for example/to show you, instead of file I am using a binary string)
a_string = '011010010111001101101111011011010110110101110011010000110010010111000100100110110101101111011011110111011001101100011011010111011001101000011001001101100011100010010110110011111011001110001001011011'
s_0 = a_string.count('1000')
s_1 = a_string.count('1010')
s_2 = a_string.count('1100')
s_3 = a_string.count('1110')
print(1000, s_0, '\n', 1010, s_1, '\n', 1100, s_2, '\n', 1110, s_3)
result =
1000 = 7, 1010 = 7, 1100 = 13, 1110 = 11. Please note, want to count each binary number separately
CodePudding user response:
You might use a pattern and create a dictionary.
(?=(1[01]{2}0))
(?=
Positive lookahead, assert to the right of the current position(1[01]{2}0)
Capture group 1 (which will be returned by re.findall), match1
, then 2 times either0
or1
and match a0
at the end
)
Close lookahead
import re
pattern = r"(?=(1[01]{2}0))"
s = "011010010111001101101111011011010110110101110011010000110010010111000100100110110101101111011011110111011001101100011011010111011001101000011001001101100011100010010110110011111011001110001001011011"
cnt = dict()
for i in re.findall(pattern, s):
# get the value by key, or return 0 if the key does not exist
cnt[i] = cnt.get(i, 0) 1
print(cnt)
Output
{'1010': 7, '1110': 11, '1100': 13, '1000': 7}
For a pattern with 5 digits that starts with 1 and ends with 0:
pattern = r"(?=(1[01]{3}0))"
The output will be:
{'11010': 7, '10100': 3, '10010': 7, '11100': 5, '10110': 17, '11110': 4, '10000': 2, '11000': 5}
See another Python demo
CodePudding user response:
With your method you are including overlapping sequences in the total count. For instance, a_string[9:13]
and a_string[10:14]
both contain a 4-digit sequence starting with 1
and ending with 0
.
A regex may be useful if you wanted to exclude overlaps:
#this will output 26, while the single count() calls would sum up 38
pat=r'1\d{2}0'
len(re.findall(pat,a_string))