Home > Software engineering >  Regex: match only when a substring is present between two words
Regex: match only when a substring is present between two words

Time:07-01

I want to limit the search between 2 words.

I have tried:

<Notes>(?:(?!<Notes>)[\s\S])*?sample(?:[\s\S\w] )<\/Notes>(?:(?<![\s\S\w]<\/Notes>))*?

with options /gmU

For Text:

TextBefore<Notes>this is sample notes</Notes>TextWiths1ample<Notes>this is sample notes</Notes>

and Text:

TextBefore<Notes>this is sample notes</Notes>TextWithsample<Notes>this is sample notes</Notes>

The screenshot below will give you an idea of what I want to achieve: succesfull.

But the screenshot below shows that the regex is not limited between the 2 words: failed

Hope someone can help me (there is a reason why not to parse this as XML).

Saved regexp: https://regex101.com/r/0fhNxI/1

CodePudding user response:

First of all, remove the U flag, it is very confusing since it swaps lazy and greedy quantifiers. Then, make sure you exclude both <Notes> and </Notes> from matching before sample. It is also a good idea to exclude the sample, too, and use

<Notes>(?:(?!<\/?Notes>|sample)[\s\S])*sample[\s\S]*?<\/Notes>

Or,

<Notes>(?:(?!<\/?Notes>)[\s\S])*?sample[\s\S]*?<\/Notes>

See the regex demo #1 and regex demo #2.

Note that [\s\S\w] = [\s\S].

Details:

  • <Notes> - a fixed string
  • (?:(?!<\/?Notes>)[\s\S])*? - any char, zero or more occurrences (as few as possible), that does not start a <Notes> or </Notes> char sequence
  • sample - a fixed string
  • [\s\S]*? - any zero or more chars as few as possible
  • <\/Notes> - a fixed string.

CodePudding user response:

/<([\w\s]*>).*?</\1/ig

you need a back reference and laziness. maybe that was your problem.

check here the behaviour https://regex101.com/r/nVVsOQ/1

  • Related