I have regex which works searching html <h>
family tags but does not work if any other tag inside <h>
. See the examples below.
<h([\d]).*>\s*[\d]*\s?[.]?\s?([^<] )<\/h([\d])>
It works
<h2 style="margin-top:1em;">What is Python?</h2>
It does not work
<h2 style="margin-top:1em;">Python Jobs<span >New!</span></h2>
How to capture this Python Jobs<span >New!</span>
as second group? Need 3 capturing groups - 2
of h2, Python Jobs<span >New!</span>
as second group and 2
of closing h2.
CodePudding user response:
([^<] )
means to match a sequence of anything except <
before </h2>
. Since the nested tags contain <
characters, this won't match them.
Use . ?
to match the contents of the tag. The ?
makes it non-greedy, so it will stop when it gets to the first </h#>
.
You can also use a back-reference in the </h#>
part of the match, so the closing tag is forced to match the opening tag.
<h(\d).*?>\s*\d*\s?\.?\s?(. ?)<\/h(\1)>
BTW, there's no need to put \d
inside []
.