I'm trying to get the specific text strings below as separated outputs e.g. (scrape them from the HTML below):
let text = "Thats the first text I need";
let text2 = "The second text I need";
let text3 = "The third text I need";
I really don't know how to get a text that's separated by different HTML tags.
<p>
<span ><span >Count:</span>31<br></span>
<span >Something:</span> That's the first text I need
<span ><span >Something2:</span> </span>The second text I need
<br><span >Something3:</span> The third text I need
</p>
CodePudding user response:
You can iterate the child nodes of the <p>
and grab any nodeType === Node.TEXT_NODE
s that have nonempty content:
for (const e of document.querySelector("p").childNodes) {
if (e.nodeType === Node.TEXT_NODE && e.textContent.trim()) {
console.log(e.textContent.trim());
}
}
// or to make an array:
const result = [...document.querySelector("p").childNodes]
.filter(e =>
e.nodeType === Node.TEXT_NODE && e.textContent.trim()
)
.map(e => e.textContent.trim());
console.log(result);
<p>
<span >
<span >Count:</span>
31
<br>
</span>
<span >Something:</span>
That's the first text I need
<span >
<span >Something2:</span>
</span>
The second text I need
<br>
<span >Something3:</span>
The third text I need
</p>
In Cheerio:
const cheerio = require("cheerio"); // 1.0.0-rc.12
const html = `
<p>
<span >
<span >Count:</span>
31
<br>
</span>
<span >Something:</span>
That's the first text I need
<span >
<span >Something2:</span>
</span>
The second text I need
<br>
<span >Something3:</span>
The third text I need
</p>
`;
const $ = cheerio.load(html);
const result = [...$("p").contents()]
.filter(e => e.type === "text" && $(e).text().trim())
.map(e => $(e).text().trim());
console.log(result);
CodePudding user response:
Try something like this and see if it works:
html = `your sample html above`
domdoc = new DOMParser().parseFromString(html, "text/html")
result = domdoc.evaluate('//text()[not(ancestor::span)]', domdoc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
for (let i = 0; i < result.snapshotLength; i ) {
target = result.snapshotItem(i).textContent.trim()
if (target.length > 0) {
console.log(target);
}
}
Using your sample html, the output should be:
"That's the first text I need"
"The second text I need"
"The third text I need"