I am having a string in which some binary numbers are mentioned. I want to count number of occurrence of given pattern, but I want set my pattern above 7 digits of character, so the result should show only more than 7 characters. it means how I can set my pattern selection, so it should count only 7 digits and above (pattern = r"(0 1 0 1 0 1 )" {<7} ). please note: it should select 7 digits and above, not just fixed 7 digits.
import re
from collections import Counter
pattern = r"0 1 0 1 0 1 "
test_str = '01010100110011001100011100011110000101101110100001101011000111011001010011001001001101000011' \
'00110011001100110011010101001100110001111110010100100111010110001001100010011010110011'
cnt = Counter(re.findall(pattern, test_str))
print(cnt.most_common())
# result [('010101', 2), ('00110001110001111', 1), ('011101000011', 1), ('0111011001', 1), ('010010011', 1), ('001100110011', 1), ('00011111100101', 1), ('010110001', 1)]
result should be show only more than 7 character it no supposed to show ('010101', 2)
CodePudding user response:
The simplest solution is to filter the list of regex matches.
import re
from collections import Counter
pattern = r"0 1 0 1 0 1 "
test_str = '01010100110011001100011100011110000101101110100001101011000111011001010011001001001101000011' \
'00110011001100110011010101001100110001111110010100100111010110001001100010011010110011'
cnt = Counter([p for p in re.findall(pattern, test_str) if len(p) > 6])
print(cnt.most_common())
Output:
[('001100110011', 2), ('000111000111100001', 1), ('011011101', 1), ('00001101011', 1), ('000111011001', 1), ('010011001', 1), ('001001101', 1), ('00001100110011', 1), ('00110011000111111', 1), ('00101001', 1), ('0011101011', 1), ('000100110001', 1), ('001101011', 1)]