For example, if I want to match all text between two different tags, as long as the first tag doesn't appear again within the text in between.
So let's say the specific strings I want to match between are "<tag 1>hello</tag 1>" and "<tag 2>hi there</tag 2>" and the specific string I don't want in between them is "<tag 1>"
So I'd want a match with this:
<tag 1>hello</tag 1>
a bunch of text that includes newlines
<tag 2>hi there</tag 2>
But not a match with this:
<tag 1>hello</tag 1>
a bunch of text that includes newlines
<tag 2>something other than hi there</tag 2>
<tag 1>something other than hello</tag 1>
a bunch of text that includes newlines
<tag 2>hi there</tag 2>
I've tried
<tag 1>hello</tag 1>[\S\s]*?(?=<tag 1>|$)<tag 2>hi there</tag 2>
Which doesn't work.. just doesn't match anything.
I'll be using python with this, so python regex dialect would be good.
CodePudding user response:
"<tag 1>hello</tag 1>.*(\n .*|\s)*(?:(?!tag 1).)*(\n .*|\s)*.*<tag 2>hi there</tag 2>"mg
This regex:
(?:(?!tag 1).)*
- excludes tag 1 string as a non-capturing group(\n .*|\s)*
- matches text on multiple lines.*
at the end of the expression allows multiple new lines between 2 strings<tag 1>hello</tag 1>.*
....*<tag 2>hi there</tag 2>
- matches everything between the strings <tag 1>hello</tag 1> and <tag 2>hi there</tag 2>
CodePudding user response:
This worked
<tag 1>hello</tag 1>(?:[^<]*(?:<(?!tag 1)[^<]*)*?)<tag 2>hi there</tag 2>
Thanks to bobble bubble who suggested it in a comment.
CodePudding user response:
<tag 1>hello<\/tag 1>(?:\n)?(?!<tag1>)[a-zA-Z\s\n]*<tag 2>hi there<\/tag 2>
<tag 1>hello<\/tag 1>(?:\n)?
matches the string '<tag 1>hello</tag 1>' and allows for a new line after it.
(?!<tag1>)
makes sure the string ' does not appear (negative lookahead)
[a-zA-Z\s\n]*
matches 0 or more letters, spaces and newlines
<tag 2>hi there<\/tag 2>
matches the string '<tag 2>hi there</tag 2>'