Home > Enterprise >  python regex parse using re.sub() with multiple posibilities
python regex parse using re.sub() with multiple posibilities

Time:10-28

I am trying to parse the numeral content embedded in a string. The string has three possible forms:

  1. 'avenue\d ', where \d is a number with one or more digits or
  2. 'road\d ' or
  3. 'lane\d ' I tried:
re.sub(r'(?:avenue(\d )|road(\d )|lane(\d*))',r'\1','road12')

This code works well for the first line below, but incorrectly for the second.

re.sub(r'(?:avenue(\d )|road(\d )|lane(\d*))',r'\1','avenue12')
Out[81]: '12'
re.sub(r'(?:avenue(\d )|road(\d )|lane(\d*))',r'\1','road12')
Out[82]: ''

what am I doing incorrectly? thanks i

CodePudding user response:

The capturing group that participated in the match was different. In the first case, it was Group 1, in the second case, it was Group 2.

Also, note that the non-capturing group is superfluous, remove it.

To fix the immediate issue, you can use r'\1\2\3' as replacement:

re.sub(r'avenue(\d )|road(\d )|lane(\d )',r'\1\2\3','road12')

However, it seems extracting is much simpler here:

m = re.search(r'(?:avenue|road|lane)(\d )','road12')
if m:
    print(m.group(1))

See the regex demo.

Details:

  • (?:avenue|road|lane) - either avenue, road, or lane
  • (\d ) - Group 1: one or more digits.

CodePudding user response:

Would this work? The part that changes, avenue, road or lane can go in the non capturing group, then get the following number:

re.sub(r'(?:avenue|road|lane)(\d )',r'\1','road12')
  • Related