Home > other >  How to stop RegEx match at the end of a sentence?
How to stop RegEx match at the end of a sentence?

Time:07-31

I want to split the text into chunks of a given character length.
The split should happen on the next whitespace character after hitting a specified character count.

This is my regex:

\S .{0,14}(?= |$)

And here is sample input:

When done correctly, heat exposure offers tremendous benefits. However, it is extremely dangerous to use temperatures that are too hot. What is too hot? That will depend.

I’ve prepared a working example: https://regex101.com/r/JGJvtm/3

However, I additionally need to end a given chunk, when it encounters the end of a sentence [!.?](?= |$)

As seen in the example: “…benefits. However, it…” is now considered one chunk.
The split should happen after the point, and the next chunk should start with “However", even though the character limit wasn’t yet reached.

Feel free to propose an entirely different regex, whatever works.

CodePudding user response:

There is an issue with your current regex: The \S will capture potentially long words, and the counting of 14 characters only starts after that match. According to your description also the letters of that first word should count towards the limit.

I would suggest this regex:

(?=\S)((?![!?.](\s|$)).){0,13}\S*

  • (?=\S) ensures that the match starts with a non-white-space character without actually capturing it yet.
  • (?![!?.](\s|$)) ensures the next character is not a sentence terminator followed by white space or by the end of the input.
  • ( .){0,13} matches up to 13 characters
  • \S* will capture the remaining non-white-space characters.

CodePudding user response:

\S.{13}\S*\s

https://regex101.com/r/2g4TBr/1

start with 1 non-space, then any 13 chars, then catch the "tail" before the space

  • Related