There is a string s = 'kjlj lkj3 444 2345 add56fg' and I wonder how to match numbers that of lengh from 1 to 3 only. Thus number '2345' should not be returned, but ['3', '444', '56']. My first approach was to use this expression r'\d{1,3}', but it returns ['3', '444', '234', '5', '56']. Then I came up with an idea to filter out digits and then digits with len <= 3. r'\d ' -> ['3', '444', '2345', '56'] and len <= 3. It is fine, but I wonder if it is possible to achieve it using only REGEX.
CodePudding user response:
r'\d{1,3}'
doesn't work because you're not ensuring there's no number before and no number after.
You should use a negative lookbehind and a negative lookahead to make sure you don't just capture part of the number.
(?<!\d)\d{1,3}(?!\d)
In python:
import re
s = 'kjlj lkj3 444 2345 add56fg'
data = re.findall(r'(?<!\d)\d{1,3}(?!\d)', s)
print(data) # ['3', '444', '56']
CodePudding user response:
Well, this looks like a use case for lookarounds.
You need to match \d{1,3}
not preceded and not followed by by another digit. The former is called a negative lookbehind and the latter is called a negative lookahead.
Both are a form of zero-width assertions: they don't include characters into the final result, but they certainly affect what text is matched.
(?<!\d)\d{1,3}(?!\d)