Home > other >  Regular Expression matches "<" except "<em>"
Regular Expression matches "<" except "<em>"

Time:01-18

I have a client input string with some words highlighted which looks like <em>TEST</em>. however, I can see there is < existing in that string as well(appearing as solo < or <some letters...) which I want to replace the < with other letter or delete them but keep <em>TEST</em>.

I want to use regular expression to match those except <em>TEST</em> and tried a lot but still no clue, please help me out.

CodePudding user response:

/<(?!\/?em>)/

This assumes you want to ignore all <em> and </em>, not just <em>TEST</em>.

⚠️ Using regex instead of a proper HTML parser will break on corner cases, or even common cases you weren't anticipating. Use at your own risk. See the links in the comments above. You can keep adding to the regex to handle more cases, but it will never get to 100%

Press Run below to try it out. Output will be updated as you type in the text area.

const pattern = /<(?!\/?em>)/g

const inField = document.getElementById('in')
const outField = document.getElementById('out')

function escapeHtml(unsafe) {
  return unsafe
    .replace(/&/g, "&amp;")
    .replace(/</g, "&lt;")
    .replace(/>/g, "&gt;")
    .replace(/"/g, "&quot;")
    .replace(/'/g, "&#039;")
}

function update() {
  const screened = inField.value.replace(pattern, '❌')
  outField.innerHTML = escapeHtml(screened)
}

inField.addEventListener('input', update, false)
inField.value = `test "<" replacement:
 - should NOT be replaced in: <em>TEST</em>
 - should be replaced in: <b>
 - should be replaced in: 4 < 3
 - should be replaced in: <em and </em
 - should be replaced in: <emph>these</emph>
`
update()
<textarea id="in" style="width:100%;height:40vh"></textarea>
<pre><code><div id="out"></div></code></pre>

  • Related