Home > Back-end >  Specify end of regex group
Specify end of regex group

Time:10-23

I am trying to create a regular expression which matches multiple groups, so the values between the groups can be extracted. Each group looks identical.

Lets consider the following example, note that the linebreaks are intended:

dog 1
wuff
wuff
cat
123
XYZ
dog 1
wuff
wuff
cat
456
ABC
dog 1
wuff
wuff
cat
789

Thus, with the right regular expression I want to get the output:

123
XYZ
456
ABC
789

On regex101.com I tried:

(?s)(?:dog.*cat)

which matches all values between the first occurence of dog an the last occurence of cat.

In addition I tried:

(?s)(?:dog.*(cat){1})

which, with my limited knowledge, should match the first occurence of cat and then end the group, but it does not.

I appreciate any help.

CodePudding user response:

You may use this regex in MULTILINE mode to capture value after dog.*cat matches:

^dog\b(?:.*\n) ?cat\n(.*(?:\n.*)*?)(?=\ndog|\Z)

Your values are present in capture group #1

RegEx Demo

RegEx Details:

  • ^: Match start line
  • dog\b: Match word dog with a word boundary
  • (?:.*\n) ?: Match anything followed by a line break. Repeat this 1 times (lazy)
  • cat\n: Match cat followed by a newline
  • (.*(?:\n.*)*?): These are the multiline values you're interested in the first capture group.
  • (?=\ndog|\Z): Lookahead to assert that we have a dog after line break or end of input ahead of the current position
  • Related