Home > Software design >  Issue with Nested html tags
Issue with Nested html tags

Time:03-30

I have regex which works searching html <h> family tags but does not work if any other tag inside <h>. See the examples below.

<h([\d]).*>\s*[\d]*\s?[.]?\s?([^<] )<\/h([\d])>

It works

<h2 style="margin-top:1em;">What is Python?</h2>

It does not work

<h2 style="margin-top:1em;">Python Jobs<span >New!</span></h2>

How to capture this Python Jobs<span >New!</span> as second group? Need 3 capturing groups - 2 of h2, Python Jobs<span >New!</span> as second group and 2 of closing h2.

CodePudding user response:

([^<] ) means to match a sequence of anything except < before </h2>. Since the nested tags contain < characters, this won't match them.

Use . ? to match the contents of the tag. The ? makes it non-greedy, so it will stop when it gets to the first </h#>.

You can also use a back-reference in the </h#> part of the match, so the closing tag is forced to match the opening tag.

<h(\d).*?>\s*\d*\s?\.?\s?(. ?)<\/h(\1)>

BTW, there's no need to put \d inside [].

  • Related