Consider this string:
text = '''
4 500,5
12%
1,63%
568768,74832 days in between
34 cars in a row'''
As you can see, there are simple numbers, numbers with spaces in between, numbers with comas, and both. Thus, 4 500,5
is considered as a standalone, separate number. Extracting the numbers with comas and spaces is easy and I found the pattern as:
pattern = re.compile(r'(\d )?\d ,\d ')
However, I am struggling to extract just the simple numbers like 12 and 34. I tried using (?!...)
and [^...]
but these options do not allow me to exclude the unwanted parts of other numbers.
CodePudding user response:
((?:\d )?\d ,\d )|(\d (?! \d))
I believe this will do what you want (Regexr link: https://regexr.com/695tc)
To capture "simple" numbers, it looks for [one or more digits], which are not followed by [a space and another digit].
I edited so that you can use capture groups appropriately, if desired.
CodePudding user response:
If you only want to match 12 and 34:
(?<!\S)\d \b(?![^\S\n]*[,\d])
(?<!\S)
Assert a whitespace boundary to the left\d \b
Match 1 digits and a word boundary(?!
Negative lookahead, assert what is directly to the right is not[^\S\n]*[,\d]
Match optional spaces and either,
or a digit
)
Close lookahead