My case is: I have a string with HTML elements:
<a href="something specific_string" title="testing">This is a text and "specific_string"</a>
I need a Regex to match only the one that is not in a HTML attribute.
This is my current Regex, it works but it gives a false positive when the string is wrapped by double quotes
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
I have tried the following Regex:
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
It works but it gives a false positive when the string is wrapped by double quotes
CodePudding user response:
if you want to get what's inside the tag you might be trying to use the split() tool; to cut the string every >" or "<" basically like this:
let string = "<a href='something specific_string' title='testing'>This is a text and 'specific_string'</a>";
string = string.split('>');
string = string[1].split('<');
console.log(string)
So, when you want to manipulate it, just use position 0 of the string. Is not regex like u wnat, but is an idea
CodePudding user response:
Though it can suffice in simple cases, you should know it's often said that RegExp is ill-suited for parsing HTML, and depending on environment you could be better off using more robust techniques. (There's http://htmlparsing.com/ dedicated to the topic but yet it doesn't discuss JS.)
That said, the following works in Chrome 107 and Node 16.13.
(s=>s.match(/(?<=>[^<]*|^[^<]*)specific_string/))
('<a href="something specific_string" title="testing">This is a text and "specific_string"</a>')
It uses look-behind. In lieu of that you could use /(>[^<]*|^[^<]*)(specific_string)/
and compensate index/lengths to get the position of a match...
As you answer in a comment that you'll replace in user-provided HTML, I encourage you to consider security implications (namely XSS).
Back on the topic of parsing HTML w/o RegExp we obviously have the techniques in a web browser and I couldn't stop myself writing a quick and dirty textNode replacer in web JS, working in Chrome 107:
((html, fun) => {
const el = document.createElement('body')
el.innerHTML = html
const X = new XPathEvaluator, R = X.evaluate('//*[text()]', el)
const A = []; for (let n; n = R.iterateNext();) A.push(n) // mutating el while iterating XPathResult is illegal
for (let n of A) fun(n)
return el.innerHTML})
('<a href="something specific_string" title="testing">This is a text and "specific_string"</a>',
n => n.innerHTML = n.innerHTML
.replace(/specific_string/, '<b>replaced</b>'))