Home > Enterprise >  Python regex Get first element after specific string
Python regex Get first element after specific string

Time:07-12

I'm trying to get the first number (int and float) after a specific pattern:

strings = ["Building 38 House 10",
           "Building : 10.5 house 900"]
for x in string:
    print(<rule>)

Wanted result:

'38'
'10.5'

I tried:

for x in strings:
    print(re.findall(f"(?<=Building). \d ", x))
    print(re.findall(f"(?<=Building). (\d .?\d )", x))
[' 38 House 10']
['10']
[' : 10.5 house 900']
['00']

But I'm missing something.

CodePudding user response:

You could use a capture group:

\bBuilding[\s:] (\d (?:\.\d )?)\b

Explanation

  • \bBuilding Match the word Building
  • [\s:] Match 1 whitespace chars or colons
  • (\d (?:\.\d )?) Capture group 1, match 1 digits with an optional decimal part
  • \b A word boundary

Regex demo

import re
strings = ["Building 38 House 10",
           "Building : 10.5 house 900"]
pattern = r"\bBuilding[\s:] (\d (?:\.\d )?)"
for x in strings:
    m = re.search(pattern, x)
    if m:
        print(m.group(1))

Output

38
10.5

CodePudding user response:

An idea to use \D (negated \d) to match any non-digits in between and capture the number:

Building\D*\b([\d.] )

See this demo at regex101 or Python demo at tio.run

Just to mention, use word boundaries \b around Building to match the full word.

CodePudding user response:

re.findall(r"(?<![a-zA-Z:])[- ]?\d*\.?\d ", x)

This will find all numbers in the given string.

If you want the first number only you can access it simply through indexing:

re.findall(r"(?<![a-zA-Z:])[- ]?\d*\.?\d ", x)[0]
  • Related