I'm scraping some paragraphs from a website and I occur this problem but I don't know how to resolve it.
The structure is something like this, for example:
<div class = "container">
<p> This is a long paragraph 1. </p>
<p> This is a long paragraph 2. </p>
<p> This is a long paragraph 3. </p>
<p> This is a long paragrahp 4. </p>
</div>
So I had do something like this to get the text inside the example paragraph I've just mentioned.
function scrapeData() {
let data = []
let url = `scraping-url`;
axios(url)
.then(response =>{
const html = response.data
const $ = cheerio.load(html, {xmlMode: true})
$('.container', html).each(function(){
const text = $(this).find('p').text()
data.push({
text
})
console.log(data)
})
}).catch(err => console.log(err))
}
But the result I get is {This is a long paragraph 1.This is a long paragraph 2.This is a long paragraph 3.This is a long paragraph 4.}
sticking together, I want to separate these paragraphs into each chunk of text
I want it like this in my console.log(data)
{
This is a long paragraph 1.
This is a long paragraph 2.
This is a long paragraph 3.
This is a long paragraph 4.
}
CodePudding user response:
Adapt the selector to match p
tags, and then loop through each and construct your data.
Try this:
// select p tags in the container
$('.container p', html).each(function(){
const text = $(this).text();
data.push({
text
});
});
console.log(data);