I need to find the Max consecutive occurrences on a String based on the condition that they can count as more than one if they are consecutive if we have a match of the same word inside of the same sequence but is not consecutive doesn't count, here is an example:
sequence = "ababcbabc"
words = ["ab", "babc", "bca"]
output:
[2, 2, 0]
because we can see that 'ab' is actually repeated 3 times on the sequence string, however the condition says that the third one doesn't count because is not consecutive, the same rule apply for 'babc' and 0 if the evaluated word doesn't exist like in the case of 'bca, i have tried with sequence.find but only gives me where the first occurrence starts which is not very convenient to evaluated if the occurrences are together or not, same thing with sequence.rfind, sequence.count gives me all the occurrences but without any condition with .count i get output = [3, 2], also tried with re.findall re.finditer
in case we have a sequence like this 'abrtfhg' since there is only one match that count as 1 so the output on this case should be: [1,0, 0]
def maxKOccurrences(sequence, words):
result = []
for i in words:
if i in sequence:
index_word = sequence.count(i)
result.append(index_word)
else:
result.append(0)
print(result)
x = "ababcbabc"
y = ["ab", "babc", "bca"]
maxKOccurrences(x, y)
CodePudding user response:
You can try:
import re
def max_occur(s, words):
repeats = [list(map(lambda x: len(x) // len(word), re.findall(rf'(?:{word}) ', s))) for word in words]
return [1 if max(rep, default=0) == 1 else sum(r for r in rep if r > 1) for rep in repeats]
print(max_occur('ababcbabc', ["ab", "babc", "bca"])) # [2, 2, 0]
print(max_occur('aaabaa', ['a', 'b', 'c'])) # [5, 1, 0]
print(max_occur('aaabaabb', ['a', 'b', 'c'])) # [5, 2, 0]
The regex here detects repeats of word
, and then divide the length of each repeat by the length of the word (len(x) // len(word)
)., which yields the number of repeats. The rest of the code processes this result with somewhat complicated logic; if the occurrence is at max 1 (i.e., only singleton), then just spit out 1. Otherwise sum the repeats other than singletons.