Home > OS >  RegExp match groups tag that contains specific tag
RegExp match groups tag that contains specific tag

Time:12-01

I need to extract ONLY the entire contents of all the tags containing the tag. If this can help I post the exact opposite of the Regular Expression that I need:

  • <a>((?!<minnie>)(?:.|\n))*?<\/a>

Example xml:

<a>
    <id>1</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>
<a>
    <id>2</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
    <mickey>
        <pluto>Info</pluto>
        <minnie>Info</minnie>
    </mickey>
</a>
<a>
    <id>3</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>
<a>
    <id>4</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>
<a>
    <id>5</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
    <mickey>
        <pluto>Info</pluto>
        <minnie>Info</minnie>
    </mickey>
</a>
<a>
    <id>6</id>
    <goofy>Info</goofy>
    <trudy>Info</trudy>
</a>

In this case it should extract only the <a> tags with id = 2 and id = 5

CodePudding user response:

You may use this regex:

<a>(?:(?:(?!</a>).)*\n)*?\s*<minnie>(?:.*\n) ?</a>

RegEx Demo

RegEx Details:

  • <a>: Match <a>
  • (?:: Start a non-capture group
    • (?:(?!</a>).)*: Match 0 or of any characters as long as there is no </a> at next position. (This is to prevent matches across the <a> tags)
    • \n: Match a new line
  • )*?: End non-capture group. Repeat this 0 or more times (non-greedy)
  • \s*: Match 0 or more whitespaces
  • <minnie>: Match <minnie>
  • (?:.*\n) ?: Match 1 or more lines lazily
  • </a>: Match </a>
  • Related