Home > OS >  Find occurrences of pairs of letters in string with regex
Find occurrences of pairs of letters in string with regex

Time:12-15

my code:


regex = r"BB"

test_str = "NBCCNBBBCBHCB"

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum   1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

output:

Match 1 was found at 5-7: BB

My question is, why doesnt it match the second pair of BB?

(NBCCNBBBCBHCB)

CodePudding user response:

The reason only 1 instance of BB is found is your regex consumes the input when matching, so the pointer is moved to after the first pair of BB, after which there are no more pairs.

Rather than thinking about "pairs", what you want to find is "all B that is followed by B":

B(?=B)

See live demo (slightly modified to highlight the quantity of matches).

This find 2 hits as you expect, because only one B is consumed per match (look aheads don't consume input).

  • Related