Home > Net >  Regex to capture a sentence and its punctuation?
Regex to capture a sentence and its punctuation?

Time:10-15

I have a paragraph of text:

EINE ZWEIWÖCHIGE EURAIL-REISE: PARIS, SCHWEIZ UND DEUTSCHLAND
Wenn Du meinen Blog in den letzten Monaten gelesen hast, wirst Du wissen, dass ich im März diesen Jahres während meiner Frühjahresferien zwei Wochen lang durch Europa gereist bin.
Das habe ich in diesen zwei Wochen wirklich ausgenutzt!

I'd like to capture each sentence and its punctuation, if any. In this example, it would include the new-line (\n) the period (.), and the exclamation point (!). If the new-line character can't be captured, I'm OK with just the sentence being captured.

This pattern, (?'s'.*)(?'p'(\.|\!))\n?, correctly captures the last two sentences, but not the first sentence or its \n.

Example: https://regex101.com/r/XS3lbQ/1

CodePudding user response:

Here is the solution:

(?'s'.*)(?'p'(?:\.|\!|\n))\n?

CodePudding user response:

(?'s'[^.!\n] )(?'p'[.!\n])

It is more performant than the first answer. This one finishes in 25 (7 7 9 2) steps, whereas the solution in the first answer finishes in 49 (12 12 17 8) steps. You can look at the steps from the "Regex Debugger" link in the left menu of the demo.

Demo: https://regex101.com/r/lDDHbr/1

  • [^.!\n] => Any character except ".", "!", and "\n"
  • [^.!\n] => One or more "[^.!\n]". (It captures as long as possible because it's greedy)
  • [.!\n] => One of the ".", "!", or "\n" characters.
  • (?'s') => Name of the capture group. ("s" is the name in this situation)
  • Related