I'm trying to get the first number (int and float) after a specific pattern:
strings = ["Building 38 House 10",
"Building : 10.5 house 900"]
for x in string:
print(<rule>)
Wanted result:
'38'
'10.5'
I tried:
for x in strings:
print(re.findall(f"(?<=Building). \d ", x))
print(re.findall(f"(?<=Building). (\d .?\d )", x))
[' 38 House 10']
['10']
[' : 10.5 house 900']
['00']
But I'm missing something.
CodePudding user response:
You could use a capture group:
\bBuilding[\s:] (\d (?:\.\d )?)\b
Explanation
\bBuilding
Match the wordBuilding
[\s:]
Match 1 whitespace chars or colons(\d (?:\.\d )?)
Capture group 1, match 1 digits with an optional decimal part\b
A word boundary
import re
strings = ["Building 38 House 10",
"Building : 10.5 house 900"]
pattern = r"\bBuilding[\s:] (\d (?:\.\d )?)"
for x in strings:
m = re.search(pattern, x)
if m:
print(m.group(1))
Output
38
10.5
CodePudding user response:
An idea to use \D
(negated \d
) to match any non-digits in between and capture the number:
Building\D*\b([\d.] )
See this demo at regex101 or Python demo at tio.run
Just to mention, use word boundaries \b
around Building
to match the full word.
CodePudding user response:
re.findall(r"(?<![a-zA-Z:])[- ]?\d*\.?\d ", x)
This will find all numbers in the given string.
If you want the first number only you can access it simply through indexing:
re.findall(r"(?<![a-zA-Z:])[- ]?\d*\.?\d ", x)[0]