RegExp match groups tag that contains specific tag-CodePudding

I need to extract ONLY the entire contents of all the tags containing the tag. If this can help I post the exact opposite of the Regular Expression that I need:

<a>((?!<minnie>)(?:.|\n))*?<\/a>

Example xml:

<a>
    <id>1</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>
<a>
    <id>2</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
    <mickey>
        <pluto>Info</pluto>
        <minnie>Info</minnie>
    </mickey>
</a>
<a>
    <id>3</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>
<a>
    <id>4</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>
<a>
    <id>5</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
    <mickey>
        <pluto>Info</pluto>
        <minnie>Info</minnie>
    </mickey>
</a>
<a>
    <id>6</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>

In this case it should extract only the <a> tags with id = 2 and id = 5

CodePudding user response：

You may use this regex:

<a>(?:(?:(?!</a>).)*\n)*?\s*<minnie>(?:.*\n) ?</a>

RegEx Demo

RegEx Details:

<a>: Match <a>
(?:: Start a non-capture group
- (?:(?!</a>).)*: Match 0 or of any characters as long as there is no </a> at next position. (This is to prevent matches across the <a> tags)
- \n: Match a new line
)*?: End non-capture group. Repeat this 0 or more times (non-greedy)
\s*: Match 0 or more whitespaces
<minnie>: Match <minnie>
(?:.*\n) ?: Match 1 or more lines lazily
</a>: Match </a>