Regex to find <h1>Kalani Doe</h1> but not <h1>Kalani Do

I need to find first names and last names inside H1 element such as,

<h1>Kalani Doe</h1>

But not one that has the word "Profile" in it, such as,

<h1>Kalani Doe | Profile</h1>

I know I need to use ?! but I can't get around to understand how to combine positive and negative.

I tried this,

(<h1>(. ?)<\/h1>)(?!<h1>(. ?)Profile<\/h1>)

But it didn't work.

CodePudding user response：

you can use negative Lookbehind (?<!chars)

const str = `<h1>Kalani Doe</h1>
<h1>Kalani Doe | Profile</h1>
<h1>John  Doe</h1>`;

let m = str.match(/<h1>.*(?<!Profile)<\/h1>/gm)
console.log(m);

CodePudding user response：

If you have the availability to use a Dom parser, you could get all the h1 elements, and check if the string does not contain Profile.

As there is only a regex tag, and assuming there are no other angle brackets inside, you can use negated character class instead of . ? as the dot can match any character.

If Profile should not be right before the closing tag:

<h1>[^<]*<(?<!\bProfile<)\/h1>

<h1> Match literally
[^<]*< Match any char except < and then match <
(?<!\bProfile<) Assert not Profile< to the left
\/h1> Match /h1> (Note that depending on the pattern delimiter, the / does not need escaping by itself)

Regex demo

If there should be no Profile at all:

<h1>(?![^<]*\bProfile\b)[^<]*<\/h1>

Regex demo