Home > Enterprise >  Capturing all text between delimiters only when there is a line feed
Capturing all text between delimiters only when there is a line feed

Time:04-13

I am trying to capture the text between 2 delimiters only when there is also a line feed within the delimiters. So for example if we have the following text.

Organisation Name <<me.company.name>>
ABN/ACN <<me.company.abn>>
Contact Name <<me.name>>
<<me.PhoneNumber

Another line>>
Email <<me.emailAddress>>

I am wanting to only return the <<me.PhoneNumber \n\n 'Another Line>>

the \n could be anywhere - basically only matches that have at least one \n within the << >> and ignore all other << >>

The pattern I have so far is <<(.?\n)*?>> but this captures all << >> (I'm using C#)

here is an example of what I have tried https://regex101.com/r/sb0wCs/1

Thanks so much for your help

CodePudding user response:

You can use

<<((?:(?!<<|>>).)*?\n(?s:.)*?)>>

See the regex demo. Details:

  • << - a << string
  • ((?:(?!<<|>>).)*?\n(?s:.)*?) - Group 1:
    • (?:(?!<<|>>).)*? - any zero or more chars (other than newline chars) that do not start >> or << char sequence, as few as possible
    • \n - a LF char
    • (?s:.)*? - any zero or more chars (including newline chars), as few as possible
  • >> - a >> string

CodePudding user response:

You can try this: <<[^>]*?\n[^>]*>>

Test regex here: https://regex101.com/r/vD3EgE/2

<<[^>]*?\n[^>]*>>

<<      match literal <<
[^>]*?  match any char that is not > as few as possible
\n      match a newline
[^>]*   match any char that is not > as few as possible
>>      match literal >>
  • This will match a only if there is \n between << and >>.

CodePudding user response:

In your pattern <<(.*?\n*)*?>> you have a capture group and all parts are optional including the newline, so the non greedy quantifier *? can match until the first occurrence of >>

Also when repeating a capture group, the group value will hold the value of the last iteration, so instead you can put the capture group without a quantifier around the whole part that you want to capture.


If your strings start at the beginning of the line, you can use anchors and match at least a single line in between that does not start with either << or >>

^\s*<<(.*(?:\r?\n(?!<<|>>).*) \r?\n)\s*>>$

Explanation

  • ^ Start of string
  • \s*<< Match optional leading whitspace chars and <<
  • ( Capture group 1
    • .* Match the rest of the line
    • (?:\r?\n(?!<<|>>).*) Match a newline, and repeat at least 1 line not starting with << or >>
  • \r?\n Match a newline
  • ) Close group 1
  • \s*>> Match optional leading whitspace chars and >>
  • $ End of string

See a regex demo.

  • Related