I'm trying to create a regex that matches "Wonder woman" as long as it is not inside an a-tag.
The regex I have so far:
(Wonder woman)?<a.*?<\/a>|(\{\S ?\})
This matches the a-tag from beginning to end (including both). I think I'm close, but I'm out of ideas.
In the following string, I want to match the word "wonder woman" (case insensitive) as long as it is not insinde an a-tag. It's the last two lines (seperated by a new-line) that I'm trying to create a regex for.
Don't match me. I'm just random text <a>Wonder woman</a>
<a>Wonder
woman</a>
<a>test</a>
<a> Wonder
Woman test test
</a>
This is some random text that should not be matched.
Wonder
Woman
I also tried the following regex but it doesn't match "wonder woman" if it's on two seperate lines:
wonder woman(?![^<a>]*?<\/a>)
Any help with my regex is much appreciated.
Note: I'm not interested in replacing everything else with an emtpy string.
I want to match the specific word(s) and then insert a different word, let's say "Captain America".
CodePudding user response:
You can use a DOMParser
to parse the HTML, then loop thorugh each text node and replace all occurences of 'wonder woman
' with 'captain america'
:
const str = `Don't match me. I'm just random text <a>Wonder woman</a>
<a>Wonder
woman</a>
<a>test</a>
<a> Wonder
Woman test test
</a>
This is some random text that should not be matched.
Wonder
Woman
`
function match(s){
const parsed = new DOMParser().parseFromString(s, 'text/html')
parsed.body.childNodes.forEach(e => {
if(e.nodeType == 3) e.data = e.data.replace(/wonder([\r\n ]*)woman/gi, 'captain$1america')
})
return parsed.body.innerHTML
}
console.log(match(str))
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
/(?<!<a(\s|>)*(.(?!<\/a>))*)wonder\s woman/gis
it looks up wonder woman
such that there is no <a
opening tag without a </a>
closing tag in front of it.
const str = `Don't match me. I'm just random text <a>Wonder woman</a>
<a> yu Wonder
woman</a>
<a>test</a>
<a> Wonder
Woman test test
</a>
This is some random text that should not be matched.
Wonder
Woman
<a href="">
<span> value Wonder
Woman</span>
</a>
text . Wonder woman
<span></span> <a>..</a>`;
const result = str.replace(/(?<!<a(\s|>)(.(?!<\/a>))*)wonder\s woman/gis, '*****');
console.log(result);
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
If there is nested <a></a>
such as
<a><a>..</a>wonder woman</a>
it will be difficult to work with regex only.
CodePudding user response:
Using lookaheads and lookbehinds can do the trick:
/(?<!<a>.*)wonder\swoman(?!.*<\/a>)/gi
Be aware that lookbehinds aren't supported in Safari yet