Home > OS >  I need a regex to extract sentences with specific words from a text
I need a regex to extract sentences with specific words from a text

Time:10-21

I have a pattern that kind of does the job, for instance: This pattern:

[^.]*? employment ?[^.]*\.

applied to this text:

nominal wage growth continued to be rapid and broad based: average hourly earnings rose 5.2 percent over the 12 months ending in august, while the employment cost index of hourly compensation in the private sector, which also includes benefit costs, rose 5.5 percent over the 12 months ending in june, 2.4 percentage points faster than the year-earlier pace. consumer price inflation remained elevated.

Returns

2 percent over the 12 months ending in august, while the employment cost index of hourly compensation in the private sector, which also includes benefit costs, rose 5.

But it should return the whole sentence...

Bottom line, the pattern should ignore points between numbers, abbreviations and so on.

If someone could come up with a pattern that does that I would highly appeciate.

CodePudding user response:

I've update your regular expression:

(?:[^.]|\d \.\d )*? employment ?(?:[^.]|\d \.\d )*\.(?!\d)

So it can accept dots when between digits, the last part is a negative lookahead that verifies that there is no digit after the last dot.

https://regex101.com/r/dTctZc/1

  • Related