Home > Software design >  Extract all numbers (int and floats) after specific word
Extract all numbers (int and floats) after specific word

Time:11-05

Assuming I have the following string:

str = """
         HELLO 1 Stop #$**& 5.02‼️ 16.1 
         regex

         5 ,#2.3222
      """

I want to export all numbers , Whether int or float after the word "stop" with no case sensitive . so the expected results will be :

[5.02, 16.1, 5, 2.3222]

The farthest I have come so far is by using PyPi regex from other post here:

regex.compile(r'(?<=stop.*)\d (?:\.\d )?', regex.I)

but this expression gives me only [5.02, 16.1]

CodePudding user response:

You get only the first 2 numbers, as .* does not match a newline.

You can add update the flags to regex.I | regex.S to have the dot match a newline.

import regex

text = """
         HELLO 1 Stop #$**& 5.02‼️ 16.1 
         regex

         5 ,#2.3222
      """

pattern = regex.compile(r'(?<=\bstop\b.*)\d (?:\.\d )?', regex.I | regex.S)

print(regex.findall(pattern, text))

Output

['5.02', '16.1', '5', '2.3222']

See a Python demo


If you want to print the numbers after the word "stop", you can also use python re and match stop, and then capture in a group all that follows.

Then you can take that group 1 value, and find all the numbers.

import re
 
text = """
         HELLO 1 Stop #$**& 5.02‼️ 16.1 
         regex
 
         5 ,#2.3222
      """
pattern = r"\bStop\b(. )"
 
m = re.search(pattern, text, re.S|re.I)
 
if m:
    print(re.findall(r"\d (?:\.\d )*", m.group(1)))

Output

['5.02', '16.1', '5', '2.3222']

CodePudding user response:

Yet another one, albeit with the newer regex module:

(?:\G(?!\A)|Stop)\D \K\d (?:\.\d )?

See a demo on regex101.com.


In Python, this could be

import regex as re

string = """
         HELLO 1 Stop #$**& 5.02‼️ 16.1 
         regex

         5 ,#2.3222
      """

pattern = re.compile(r'(?:\G(?!\A)|Stop)\D \K\d (?:\.\d )?')

numbers = pattern.findall(string)
print(numbers)

And would yield

['5.02', '16.1', '5', '2.3222']

Don't name your variables after inbuilt-functions, like str, list, dict and the like.


If you need to go further and limit your search within some bounds (e.g. all numbers between Stop and end), you could as well use

(?:\G(?!\A)|Stop)(?:(?!end)\D) \K\d (?:\.\d )?
#           ^^^        ^^^

See another demo on regex101.com.

CodePudding user response:

You could use:

inp = """
HELLO 1 Stop #$**& 5.02‼️ 16.1 
regex

5 ,#2.3222"""

nums = []
if re.search(r'\bstop\b', inp, flags=re.I):
    inp = re.sub(r'^.*?\bstop\b', '', inp, flags=re.S|re.I)
    nums = re.findall(r'\d (?:\.\d )?', inp)

print(nums)  # ['5.02', '16.1', '5', '2.3222']

The if logic above ensures that we only attempt to populate the array of numbers if we are certain that Stop appears in the input text. Otherwise, the default output is just an empty array. If Stop does appear, then we strip off that leading portion of the string before using re.findall to find all numbers appearing afterwards.

CodePudding user response:

import re

_string = """
          HELLO 1 Stop #$**& 5.02‼️ 16.1
          regex

          5 ,#2.3222
       """

start = _string.find("Stop")   len("Stop")
print(re.findall("[- ]?\d*\.?\d ", _string[start:]))   # ['5.02', '16.1', '5', '2.3222']

  • Related