Home > OS >  Extract sentences with numbers in it
Extract sentences with numbers in it

Time:10-14

I am trying to extract sentence segments with numbers in it. For example, the string

"This is a sentence with numbers, and this is not a sentence with numbers because 123.

should return

"and this is not a sentence with numbers because 123.

I know that there are ways to extract digits from a string, but I am not sure how to find the indices and subsequently extract the required string. Any help is appreciated

CodePudding user response:

This should work for the given question, but as commented, the question has room for errors and assumptions.

import re

test = "This is a sentence with numbers, and this is not a sentence with numbers because 123"

for t in test.split(','):
    num = re.findall(r'\d ', t)
    if num:
        print(t)

CodePudding user response:

first split text with "." ,but "1.2" will be spilted . so we use \.(?!\d)|(?<!\d)\. ,it means ". that left is not a number or right is not a number"

import re

text = "Test 1.2,Test A.1 Test,3 Test 4."

for t in re.split(r'\.(?!\d)|(?<!\d)\.|,', text):
    num = re.findall(r'\d ', t)
    if num:
        print(t)

CodePudding user response:

Comment:

I would take the whole string, split it into substrings based on characters that indicate a new substring (like . , ! ?, etc..). Then scan each substring for a number and return it, only if it contains a number. You could also use the re module to split the string.

  • Related