I am trying to extract sentence segments with numbers in it. For example, the string
"This is a sentence with numbers, and this is not a sentence with numbers because 123.
should return
"and this is not a sentence with numbers because 123.
I know that there are ways to extract digits from a string, but I am not sure how to find the indices and subsequently extract the required string. Any help is appreciated
CodePudding user response:
This should work for the given question, but as commented, the question has room for errors and assumptions.
import re
test = "This is a sentence with numbers, and this is not a sentence with numbers because 123"
for t in test.split(','):
num = re.findall(r'\d ', t)
if num:
print(t)
CodePudding user response:
first split text with "." ,but "1.2" will be spilted .
so we use \.(?!\d)|(?<!\d)\.
,it means ". that left is not a number or right is not a number"
import re
text = "Test 1.2,Test A.1 Test,3 Test 4."
for t in re.split(r'\.(?!\d)|(?<!\d)\.|,', text):
num = re.findall(r'\d ', t)
if num:
print(t)
CodePudding user response:
Comment:
I would take the whole string, split it into substrings based on characters that indicate a new substring (like
. , ! ?
, etc..). Then scan each substring for a number and return it, only if it contains a number. You could also use there
module to split the string.