I'm trying to find all digits that are followed by either a whitespace or a dash using regex
Right now hat's what it's look like :
import re
txt = "123 4 56-7 maine x1s56"
x = re.findall(r"\d \s|-\b)", txt)
print(x)
Results are :
['123 ', '4 ', '-', '7 ']
But it should print:
['123 ', '4 ', '56', '7 ']
CodePudding user response:
So, the tricky part is that both space and hyphen are part of the match, but only space gets included in the match. You need the hyphen inside a lookahead but the space outside, something like this:
\d (\s|(?=-))
CodePudding user response:
The alternation operator (|
) has the lowest precedence of all regex operators. That is, it tells the regex engine to match either everything to the left of the vertical bar or everything to the right of the bar.
So the regular expression r"\d \s|-\b)"
means (one or more digits followed by a space) OR (a dash followed by a word boundary).
If you want to limit the reach of the alternation, you need to use parentheses for grouping. Or, since you want to alternate between only two characters, you can use a character class instead.
import re
txt = "123 4 56-7 maine x1s56"
x = re.findall(r"\d [\s-]", txt)
print(x)
Output:
['123 ', '4 ', '56-', '7 ']