my code:
regex = r"BB"
test_str = "NBCCNBBBCBHCB"
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
output:
Match 1 was found at 5-7: BB
My question is, why doesnt it match the second pair of BB?
(NBCCNBBBCBHCB)
CodePudding user response:
The reason only 1 instance of BB
is found is your regex consumes the input when matching, so the pointer is moved to after the first pair of BB
, after which there are no more pairs.
Rather than thinking about "pairs", what you want to find is "all B that is followed by B":
B(?=B)
See live demo (slightly modified to highlight the quantity of matches).
This find 2 hits as you expect, because only one B is consumed per match (look aheads don't consume input).