I need to extract ONLY the entire contents of all the tags containing the tag. If this can help I post the exact opposite of the Regular Expression that I need:
<a>((?!<minnie>)(?:.|\n))*?<\/a>
Example xml:
<a>
<id>1</id>
<goofy>Info</goofy>
<trudy>Info</trudy>
</a>
<a>
<id>2</id>
<goofy>Info</goofy>
<trudy>Info</trudy>
<mickey>
<pluto>Info</pluto>
<minnie>Info</minnie>
</mickey>
</a>
<a>
<id>3</id>
<goofy>Info</goofy>
<trudy>Info</trudy>
</a>
<a>
<id>4</id>
<goofy>Info</goofy>
<trudy>Info</trudy>
</a>
<a>
<id>5</id>
<goofy>Info</goofy>
<trudy>Info</trudy>
<mickey>
<pluto>Info</pluto>
<minnie>Info</minnie>
</mickey>
</a>
<a>
<id>6</id>
<goofy>Info</goofy>
<trudy>Info</trudy>
</a>
In this case it should extract only the <a>
tags with id = 2 and id = 5
CodePudding user response:
You may use this regex:
<a>(?:(?:(?!</a>).)*\n)*?\s*<minnie>(?:.*\n) ?</a>
RegEx Details:
<a>
: Match<a>
(?:
: Start a non-capture group(?:(?!</a>).)*
: Match 0 or of any characters as long as there is no</a>
at next position. (This is to prevent matches across the<a>
tags)\n
: Match a new line
)*?
: End non-capture group. Repeat this 0 or more times (non-greedy)\s*
: Match 0 or more whitespaces<minnie>
: Match<minnie>
(?:.*\n) ?
: Match 1 or more lines lazily</a>
: Match</a>