Home > database >  How to extract sentences from the paragraph which contains numeric value. There is one constraint. T
How to extract sentences from the paragraph which contains numeric value. There is one constraint. T

Time:10-31

I need the regex pattern for python. Or if you have any other approach feel free to share.

I have written the regex for extracting the sentences that extract only numerical value (pattern: ([^.]*?\d [^.]*\.)). But I do not know how to put constraint on that numerical value.

pattern for extracting sentences with numeric value only ([^.]*?\d [^.]*\.)

Example:

The patient is suffering from fever. Their relatives come to visit them. The patient age is 20 year. His brother could not visit him due to some other work. There is another patient whose age is 30 year old. The second patient is watching him from window.

Output:

['The patient age is 20 year', 'There is another patient whose age is 30 year old']

CodePudding user response:

If we assume that dots would only occur as periods ending the sentences, then the following approach might work:

inp = 'The patient is suffering from fever. Their relatives come to visit them. The patient age is 20 year. His brother could not visit him due to some other work. There is another patient whose age is 30 year old. The second patient is watching him from window.'
matches = re.findall(r'\s*([^.]*\b(?:[1-7]?[0-9]|8[0-5])\b[^.]*\.)', inp)
print(matches)

This prints:

['The patient age is 20 year.', 'There is another patient whose age is 30 year old.']

As a note, the (?:[1-7]?[0-9]|8[0-5]) portion of the regex matches 0 through 85 inclusive.

  • Related