I have this website that I want to scrape data from it.
From page 1, there're contains 30 items, page 2, 30 items, and so on until the last page.
What I want (it'll contain all data from many pages):
[
// All push go into 1 array
{
animeURL: "animeURL",
animeTitle: "animeTitle"
},
{
animeURL: "animeURL",
animeTitle: "animeTitle"
},
...
]
From my code, I've succeeded to get what I want, but the problem is it's being separated by many arrays due to different push I guess.
What I get in console.log:
// Array from the first push match with first loop
[
{
animeURL: "animeURL",
animeTitle: "animeTitle"
},
{
animeURL: "animeURL",
animeTitle: "animeTitle"
},
]
// Array from the first push second push match with second loop.
[
{
animeURL: "animeURL",
animeTitle: "animeTitle"
},
{
animeURL: "animeURL",
animeTitle: "animeTitle"
},
]
// ... array from page 3, 4, 5, 6, ...
Here's my code:
const PORT = 8000
const axios = require('axios')
const cheerio = require('cheerio')
const express = require('express')
const app = express()
function fetchAnimeData() {
let animeData = []
for (i = 1; i<3; i ){
let url = `https://animehay.club/loc-phim/W1tdLFtdLFtdLFtdXQ==/trang-${i}.html`;
axios(url)
.then(response =>{
const html = response.data
const $ = cheerio.load(html, {xmlMode: true})
$('.movie-item', html).each(function(){
const animeUrl = $(this).find('a').attr('href')
const animeTitle = $(this).find('a').attr('title')
animeData.push({
animeUrl, animeTitle
})
})
console.log(animeData)
}).catch(err => console.log(err))
}
}
fetchAnimeData()
app.listen(PORT, ()=> {console.log(`Server is running on PORT: ${PORT}`)})
I've tried to move the animeData
variable around or let it be a global variable and console.log around, some only get [], some will remain the same like the problem I occurred, how can I console.log and print out the result that I want, which is only 1 array contains many pages data?
CodePudding user response:
You should keep track of your promises:
let jobs = []
and in every loop
jobs.push(axios(url) ...etc )
In the end you wait for all jobs to be settled:
Promise.all(jobs).then(()=>{
console.log(animeData);
})