I am trying to write a regex pattern that matches a certain word (carrots
) in a link's anchor text, but not when the word is in the <a href link itself.
Any uses of carrots
when it is not in an href, or is in the anchor text, is what I want to match.
carrots (<- matches, ok)
carrot
carrots and potatoes (<- matches, ok)
potatoes and carrots (<- matches, ok)
<a href="/carrots and potatoes">carrots and potatoes</a> (<- matches 'carrots' in link and anchor text, but I only want the one in anchor text)
<a href="/carrots">carrots and potatoes</a> (see above, it matches both but I want anchor text only)
The regex I have so far is:
~<a .*?">|\bcarrots\b
Here is the regex101 I am using for testing: https://regex101.com/r/1RKDEa/1
This is also in JavaScript (to make a Regexp out of afterwards), so I can't use *SKIP and *FAIL.
CodePudding user response:
You can use negative look-behind: (?<!\<[^>]*)\bcarrots\b
Not all browsers support it though
const data = html.innerHTML.replace(/(?<!\<[^>]*)\bcarrots\b/g, "<mark>$&</mark>")
html.innerHTML = data;
text.textContent = data;
<div id="html" style="white-space:pre">carrots
carrot
carrots and potatoes
potatoes and carrots
<a href="website.com/carrots and potatoes">carrots and potatoes</a>
<a href="/carrots">carrots and potatoes</a></div>
<div id="text" style="white-space:pre;border:1px solid black"></div>