Home > OS >  python finditer get start end of capture group [duplicate]
python finditer get start end of capture group [duplicate]

Time:09-29

I am trying to capture the start and end of a capture group for each group found using the finditer() method in re.

For example:

strng = 'move 12345-!'
matches = re.finditer('move ([0-9] ).*?', strng)
for each in matches:
    print(*each.groups())
    print(each.start(), each.end())

This will yield the start and end index position, but of the matched pattern and not specifically the captured group. I essentially want to always capture the number as this will change. The word move will always be an anchor, but I don't want to include that in the position, as I need to capture the actual position of the numbers found within the text document so that I can do slicing for each number found.

Full document might be like:

move 12345-!
move 57496-!
move 96038-!
move 00528-!

And I would capture 57496 starting/ending document[17:21] where start of the 57496 is at 17 and end is at 21. The underlying positions are being used to train a model.

CodePudding user response:

If you don't want move to be part of the match, you can turn it into a positive lookbehind to assert it to the left.

Then you can use each.group() to get the match.

Note that you can omit .*? at the end of the pattern, as it is a non greedy quantifier without anything after that part and will not match any characters.

import re

strng = 'move 12345-!'
matches = re.finditer('(?<=move )[0-9] ', strng)
for each in matches:
    print(each.group())
    print(each.start(), each.end())

Output

12345
5 10

CodePudding user response:

>>> import re
>>> strng = "move 12345-!"
>>> matches = re.finditer('move ([0-9] ).*?', strng)
>>> for each in matches:
    print(each.group(1))
    print(each.start(1), each.end(1))

    
12345
5 10
>>> 
  • Related