Home > database >  Regex - Pattern until double \n
Regex - Pattern until double \n

Time:06-09

Somehow I am not able to find anything online about how to set a pattern ending to a double \n. My particular case is the following. I have this string:

"1 Matt\n00:00:00,100 --> 00:00:01,500\nThis is said \nby Matt.\n\n2 Lucas\n00:00:01,700 --> 00:00:02,300\nWhile this is said by Lucas"

And I would like to extract only the texts between digit\n and \n\n. So, in my case, I'd like to have

This is said \nby Matt.
While this is said by Lucas

Although I am not very skilled with RegEx, I tried many combinations such as ?<=\d\n).*?(?=\n\n), ?<=\d\n).\n\n and ?<=\d\n).*?(?=\r\n\r\n) but without any luck.

I have tried those as well as others with R's stringr library, but also with python's re. The issue first came up in this answer: https://stackoverflow.com/a/72547966/19284124

CodePudding user response:

You can make the . match across lines with the (?s) inline modifier and extend the double newline pattern to alternatively match the end of string:

(?s)(?<=\d\n).*?(?=\n\n|\Z)

See the regex demo.

Details:

  • (?s) - a flag allowing . match line break chars
  • (?<=\d\n) - a positive lookbehind that matches a location that is immediately preceded with a digit and a newline
  • .*? - any zero or more chars, as few as possible
  • (?=\n\n|\Z) - a positive lookahead that matches a location that is immediately followed with two newline chars or end of string.

CodePudding user response:

This regex is more efficient and is a variant that would work on many regex flavors such as Javascript, PHP, Python, java, .NET etc because we avoid using (?s) and \Z or \z:

(?<=\d\n)(?:.*\n)*?.*(?=\n\n|$)

Make sure to use it without MULTILINE mode.

RegEx Demo

  • Related