Home > Software engineering >  Negative lookahead skips the negated expression
Negative lookahead skips the negated expression

Time:01-08

I have a text with a structure like this:

<section id="TP-1">
<h3>1. One</h3>
<p>Et harum quidem rerum facilis est et expedita distinctio.</p>
</section>

<hr  />

<section id="TP-2">
<h3>2. Two</h3>
<p>Et harum quidem rerum facilis est et expedita distinctio.</p>
<hr  />
<p ></p>
</section>

I need to add an id attribute to the p tags belonging to the footnote class, and its value should be based on the id of the parent section tag. This is what I wrote:

<section id="TP-(\d ?)">((.|\n) ?)(?!</section>)<p 

The problem is that the match is not the desired one and includes </section> while I used negative lookahead. Here is the match:

<section id="TP-1">
<h3>1. One</h3>
<p>Et harum quidem rerum facilis est et expedita distinctio.</p>
</section>

<hr  />

<section id="TP-2">
<h3>2. Two</h3>
<p>Et harum quidem rerum facilis est et expedita distinctio.</p>
<hr  />
<p 

While I expected this:

<section id="TP-2">
<h3>2. Two</h3>
<p>Et harum quidem rerum facilis est et expedita distinctio.</p>
<hr  />
<p 

Here you can check the regex: https://regex101.com/r/qGIUYd/1

CodePudding user response:

Your negative lookahead occurs after (.|\n) ? (which btw you should replace with . and s flag) that will match everything it can until the next <p .

Put the negative lookahead inside the quantified expression:

/ <section id="TP-(\d ?)">(.(?!<\/section>)) ?<p  /gms

Demo: https://regex101.com/r/NBVy90/1

  • Related