Given a string like '0100011'
and a length = 2
we have to count the frequency of pairs in the string: 00,01,10,11
.
00 -> 2 , 01 -> 2 , 10 ->1 , 11->1
The problem is if I just use the count('00')
function, it doesn't work since it only finds one occurrence: " _ _ 000 _ _"
.
I've been wondering what the right approach to this kind of problem would be, since for the moment i've tried to create some sort of substring checker with 2 pointer(index and length) which goes back and forth to be sure it doesn't miss any combination but it doesn't feel right
CodePudding user response:
If you want to find the number of occurrences of a single binary substring, you can use string slicing to get each consecutive pair of letters:
data = "0100011"
target = "00"
sum(data[start:start len(target)] == target for start in range(len(data) - len(target)))
This outputs:
2
If you want to find the number of occurrences of all binary strings of a particular length, you could use a collections.Counter
instead:
Counter(data[start:start len(target)] for start in range(len(data) - len(target)))
This outputs:
Counter({'01': 2, '00': 2, '10': 1})
(Note that if you try to look up the frequency of a binary string that doesn't occur, the Counter
will simply return 0
.)
CodePudding user response:
here is how I did it:
length = 2
s = '0100011'
counter = {}
for i in range(2**length):
b = f'{i:0{length}b}'
for j in range(len(s)-1):
if b == s[j:j length]:
counter[b] = counter.get(b, 0) 1
print(counter)
will produce:
{'00': 2, '01': 2, '10': 1, '11': 1}
for length == 3:
{'000': 1, '001': 1, '010': 1, '011': 1, '100': 1}
...
for length == 7, obviously:
{'0100011': 1}