Home > OS >  replaceAll() in JavaScript failed to find </em> in HTML page
replaceAll() in JavaScript failed to find </em> in HTML page

Time:01-04

I am not familiar with JavaScript and html. But I tried to implement a function using JavaScript.

I want to replace all <em> and </em> in a html page. So I insert a piece of javascript code in the page:

function rep() 
{
    document.body.innerHTML
        = document.body.innerHTML
        .replaceAll("<em>", "_");
    document.body.innerHTML
        = document.body.innerHTML
        .replaceAll("</em>", "_");

}
window.onload=rep()
<!DOCTYPE html>
<html lang="en">
<!-- ... -->
<article>
    <div >
        <div >
            <div >

                <p>(Weierstrass) 设 $z_{0}$ 是 $f$ 的本性奇点,那么对任意 $A \in \mathbb{C}<em>{\infty}$, 必存在趋于 $z</em>{0}$ 的点列 $\left{z_{n}\right}$, 使得 $\lim <em>{n \rightarrow \infty} f\left(z</em>{n}\right)=A$.</p>

            </div>
        </div>
    </div>
<!-- ... -->

</html>

It succeeded in replacing <em> with "_", but all </em> did not change. What's wrong with the code? Thank you!

CodePudding user response:

Let's see what happens when browsers see invalid html like:

test</em>

console.log(document.body.innerHTML)
test</em>

The above prints test (and the script)

That's because the browser strips invalid structures when parsing

When you do

document.body.innerHTML
  = document.body.innerHTML
  .replaceAll("<em>", "_");

You replace all <em> tags correctly, but the closing tags are removed

This will work on the other hand:

document.body.innerHTML = document.body.innerHTML
  .replaceAll("<em>", "_")
  .replaceAll("</em>", "_");
<em>test</em>

CodePudding user response:

It maybe better to use the available DOM methods for this.

  1. Pick up all the em elements with querySelectorAll.

  2. For each element create a text node. Bookend the element's original text content with underscores, and add that to the text node. Use replaceWith to replace the em element with the text node.

const ems = document.querySelectorAll('em');

ems.forEach(em => {
  const text = `_${em.textContent}_`;
  const node = document.createTextNode(text);
  em.replaceWith(node);
});
<p>(Weierstrass) 设 $z_{0}$ 是 $f$ 的本性奇点,那么对任意 $A \in \mathbb{C}<em>{\infty}$, 必存在趋于 $z</em>{0}$ 的点列 $\left{z_{n}\right}$, 使得 $\lim <em>{n \rightarrow \infty} f\left(z</em>{n}\right)=A$.</p>

<ul>
  <li><em>This is some italised text</em></li>
  <li>And this is not.</li>
  <li><em>But this is</em>.</li>
</ul>

Additional documentation

CodePudding user response:

Processing html with regexes or string functions is a bad idea (html is not a string), but if you must, it should be done like this:

    let html = document.body.innerHTML
    html = html.replace(...)
    html = html.replace(...) etc
    document.body.innerHTML = html

In other words, do not use a partially processed string to set innerHTML.

CodePudding user response:

Simpler but not efficient:

document.body.innerHTML.replace(/\<em\>|\<\/em\>/gm, '_');

Result:

//body before: <em>test</em>
//body after: _test_

The regex will pass over the entire body and will replace all <em> or </em> occurrences with _

The regex options g for global and m for multiline allow to cover the whole body and multiple occurrences.

  • Related