I am not familiar with JavaScript and html. But I tried to implement a function using JavaScript.
I want to replace all <em>
and </em>
in a html page. So I insert a piece of javascript code in the page:
function rep()
{
document.body.innerHTML
= document.body.innerHTML
.replaceAll("<em>", "_");
document.body.innerHTML
= document.body.innerHTML
.replaceAll("</em>", "_");
}
window.onload=rep()
<!DOCTYPE html>
<html lang="en">
<!-- ... -->
<article>
<div >
<div >
<div >
<p>(Weierstrass) 设 $z_{0}$ 是 $f$ 的本性奇点,那么对任意 $A \in \mathbb{C}<em>{\infty}$, 必存在趋于 $z</em>{0}$ 的点列 $\left{z_{n}\right}$, 使得 $\lim <em>{n \rightarrow \infty} f\left(z</em>{n}\right)=A$.</p>
</div>
</div>
</div>
<!-- ... -->
</html>
It succeeded in replacing <em>
with "_", but all </em>
did not change. What's wrong with the code?
Thank you!
CodePudding user response:
Let's see what happens when browsers see invalid html like:
test</em>
console.log(document.body.innerHTML)
test</em>
The above prints test
(and the script)
That's because the browser strips invalid structures when parsing
When you do
document.body.innerHTML
= document.body.innerHTML
.replaceAll("<em>", "_");
You replace all <em>
tags correctly, but the closing tags are removed
This will work on the other hand:
document.body.innerHTML = document.body.innerHTML
.replaceAll("<em>", "_")
.replaceAll("</em>", "_");
<em>test</em>
CodePudding user response:
It maybe better to use the available DOM methods for this.
Pick up all the
em
elements withquerySelectorAll
.For each element create a text node. Bookend the element's original text content with underscores, and add that to the text node. Use
replaceWith
to replace theem
element with the text node.
const ems = document.querySelectorAll('em');
ems.forEach(em => {
const text = `_${em.textContent}_`;
const node = document.createTextNode(text);
em.replaceWith(node);
});
<p>(Weierstrass) 设 $z_{0}$ 是 $f$ 的本性奇点,那么对任意 $A \in \mathbb{C}<em>{\infty}$, 必存在趋于 $z</em>{0}$ 的点列 $\left{z_{n}\right}$, 使得 $\lim <em>{n \rightarrow \infty} f\left(z</em>{n}\right)=A$.</p>
<ul>
<li><em>This is some italised text</em></li>
<li>And this is not.</li>
<li><em>But this is</em>.</li>
</ul>
Additional documentation
CodePudding user response:
Processing html with regexes or string functions is a bad idea (html is not a string), but if you must, it should be done like this:
let html = document.body.innerHTML
html = html.replace(...)
html = html.replace(...) etc
document.body.innerHTML = html
In other words, do not use a partially processed string to set innerHTML
.
CodePudding user response:
Simpler but not efficient:
document.body.innerHTML.replace(/\<em\>|\<\/em\>/gm, '_');
Result:
//body before: <em>test</em>
//body after: _test_
The regex will pass over the entire body and will replace all <em>
or </em>
occurrences with _
The regex options g
for global and m
for multiline allow to cover the whole body and multiple occurrences.