Home > Net >  Regex to match anchor text but not <a href link
Regex to match anchor text but not <a href link

Time:04-10

I am trying to write a regex pattern that matches a certain word (carrots) in a link's anchor text, but not when the word is in the <a href link itself.

Any uses of carrots when it is not in an href, or is in the anchor text, is what I want to match.

carrots (<- matches, ok)
carrot
carrots and potatoes (<- matches, ok)
potatoes and carrots (<- matches, ok)
<a href="/carrots and potatoes">carrots and potatoes</a> (<- matches 'carrots' in link and anchor text, but I only want the one in anchor text)
<a href="/carrots">carrots and potatoes</a> (see above, it matches both but I want anchor text only)

The regex I have so far is:

~<a .*?">|\bcarrots\b

Here is the regex101 I am using for testing: https://regex101.com/r/1RKDEa/1

This is also in JavaScript (to make a Regexp out of afterwards), so I can't use *SKIP and *FAIL.

CodePudding user response:

You can use negative look-behind: (?<!\<[^>]*)\bcarrots\b

Not all browsers support it though

const data = html.innerHTML.replace(/(?<!\<[^>]*)\bcarrots\b/g, "<mark>$&</mark>")

html.innerHTML = data;
text.textContent = data;
<div id="html" style="white-space:pre">carrots
carrot
carrots and potatoes
potatoes and carrots

<a href="website.com/carrots and potatoes">carrots and potatoes</a>

<a href="/carrots">carrots and potatoes</a></div>


<div id="text" style="white-space:pre;border:1px solid black"></div>

  • Related