Write a function that determines the maximum number of consecutive BA, CA character pairs per line-CodePudding

My respects, colleagues. I need to write a function that determines the maximum number of consecutive BA, CA character pairs per line.

print(f("BABABA125"))  # -> 3
print(f("234CA4BACA"))  # -> 2
print(f("BABACABACA56"))  # -> 5
print(f("1BABA24CA"))  # -> 2

Actually, I've written a function, but, to my mind, it's not very good.

def f(s: str) -> int:

    res = 0

    if not s:
        return res

    cur = 0
    i = len(s) - 1

    while i >= 0:
        if s[i] == "A" and (s[i-1] == "B" or s[i-1] == "C"):
            cur  = 1
            i -= 2
        else:
            if cur > res:
                res = cur
                cur = 0
            i -= 1
    else:
        if cur > res:
            res = cur

    return res

In addition, I'm not allowed to use libraries and regular expressions (only string and list methods). Could you please help me or rate my code in this context. I'll be very grateful.

CodePudding user response：

Here's a function f2 that performs this operation.

if not re.search('(BA|CA)', s): return 0
First check if the string actually contains any BA or CA (to prevent ValueError: max() arg is an empty sequence on step 3), and return 0 if there aren't any.
matches = re.finditer(r'(?:CA|BA) ', s)
Find all consecutive sequences of CA or BA, using non-capturing groups to ensure re.finditer outputs only full matches instead of partial matches.
res = max(matches, key=lambda m: len(m.group(0)))
Then, among the matches (re.Match objects), fetch the matched substring using m.group(0) and compare their lengths to find the longest one.
return len(res.group(0))//2
Divide the length of the longest result by 2 to get the number of BA or CAs in this substring. Here we use floor division // to coerce the output into an int, since division would normally convert the answer to float.

import re

strings = [
    "BABABA125",  # 3
    "234CA4BACA",  # 2
    "BABACABACA56",  # 5
    "1BABA24CA",  # 2
    "NO_MATCH_TO_BE_FOUND",  # 0
]

def f2(s: str):
    if not re.search('(BA|CA)', s): return 0
    matches = re.finditer(r'(?:CA|BA) ', s)
    res = max(matches, key=lambda m: len(m.group(0)))
    return len(res.group(0))//2

for s in strings:
    print(f2(s))

UPDATE: Thanks to @StevenRumbalski for providing a simpler version of the above answer. (I split it into multiple lines for readability)

def f3(s):
    if not re.search('(BA|CA)', s): return 0
    matches = re.findall(r'(?:CA|BA) ', s)
    max_length = max(map(len, matches))
    return max_length // 2

if not re.search('(BA|CA)', s): return 0
Same as above
matches = re.findall(r'(?:CA|BA) ', s)
Find all consecutive sequences of CA or BA, but each value in matches is a str instead of a re.Match, which is easier to handle.
max_length = max(map(len, matches))
Map each matched substring to its length and find the maximum length among them.
return max_length // 2
Floor divide the length of the longest matching substring by the length of BA, CA to get the number of consecutive occurrences of BA or CA in this string.

CodePudding user response：

Here's an alternative implementation without any imports. Do note however that it's quite slow compared to your C-style implementation.

The idea is simple: Transform the input string into a string consisting of only two types of characters c1 and c2, with c1 representing CA or BA, and c2 representing anything else. Then find the longest substring of consecutive c1s.

The implementation is as follows:

Pick a char that is guaranteed not to appear in the input string; here we use as an example. Then pick a char different from the previous one; here we use -.
Replace each occurrence of CA and BA with a .
Replace everything else in the string (that is not a ) with a - (this is why cannot be present in the original input string). Now we have a string consisting purely of s and -s.
Split the string with - as delimiter, and map each resulting substring to their length.
Return the maximum of these substring lengths.

strings = [
    "BABABA125",  # 3
    "234CA4BACA",  # 2
    "BABACABACA56",  # 5
    "1BABA24CA",  # 2
    "NO_MATCH_TO_BE_FOUND",  # 0
]

def f4(string: str):
    string = string.replace("CA", " ")
    string = string.replace("BA", " ")
    string = "".join([(c if c == " " else "-") for c in string])
    str_list = string.split("-")
    str_lengths = map(len, str_list)
    return max(str_lengths)

for s in strings:
    print(f4(s))