I'm trying to extract from the text below the value next to number
and the text in between.
Text:
The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high.
From this text I want to extract the following values:
1, the patient is allergic to dust,
next, the patient has bronchitis,
4, The patient heart rate is high
I have a pattern that allows me to get the value next to number
and the first word of the sentence:
(numbers? (\d |next)[,.]?\s?(\w ))
This is the result using re.findall
[('number 1, the', '1', 'the'),
('number next, the', 'next', 'the'),
('number 4, The', '4', 'The')]
As you can see, using groups I can extract the digit or next
value from the text. But I have not been able to extract the entire sentence.
CodePudding user response:
As your .
and ,
and the whitespace chars are optional after the digits or next
, you might write the pattern with a non greedy dot asserting numbers again to the right or the end of the string.
\bnumbers? (\d |next)[,.]?\s?(\w.*?)(?= numbers?\b|\.?$)
import re
pattern = r"\bnumbers? (\d |next)[,.]?\s?(\w.*?)(?= numbers?\b|\.?$)"
s = "The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high."
print(re.findall(pattern, s))
Output
[
('1', 'the patient is allergic to dust,'),
('next', 'the patient has bronchitis,'),
('4', 'The patient heart rate is high')
]
CodePudding user response:
Try (regex101):
import re
s = "The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high."
pat = re.compile(r"numbers? (\d |next)[,.]?\s?([^[,.] )")
print(pat.findall(s))
Prints:
[
("1", "the patient is allergic to dust"),
("next", "the patient has bronchitis"),
("4", "The patient heart rate is high"),
]