I am trying to capture the start and end of a capture group for each group found using the finditer()
method in re
.
For example:
strng = 'move 12345-!'
matches = re.finditer('move ([0-9] ).*?', strng)
for each in matches:
print(*each.groups())
print(each.start(), each.end())
This will yield the start and end index position, but of the matched pattern and not specifically the captured group. I essentially want to always capture the number as this will change. The word move
will always be an anchor, but I don't want to include that in the position, as I need to capture the actual position of the numbers found within the text document so that I can do slicing for each number found.
Full document might be like:
move 12345-!
move 57496-!
move 96038-!
move 00528-!
And I would capture 57496
starting/ending document[17:21]
where start of the 57496 is at 17 and end is at 21. The underlying positions are being used to train a model.
CodePudding user response:
If you don't want move
to be part of the match, you can turn it into a positive lookbehind to assert it to the left.
Then you can use each.group()
to get the match.
Note that you can omit .*?
at the end of the pattern, as it is a non greedy quantifier without anything after that part and will not match any characters.
import re
strng = 'move 12345-!'
matches = re.finditer('(?<=move )[0-9] ', strng)
for each in matches:
print(each.group())
print(each.start(), each.end())
Output
12345
5 10
CodePudding user response:
>>> import re
>>> strng = "move 12345-!"
>>> matches = re.finditer('move ([0-9] ).*?', strng)
>>> for each in matches:
print(each.group(1))
print(each.start(1), each.end(1))
12345
5 10
>>>