Im currently trying to get all posts of a paginated website, everything is working fine, my only problem is that i dont know how to end my for loop if axios catches an error
getMaxPageAmount(url: any) {
let maxPage = 600;
let allLinks = [] as any;
let collection = [] as any;
for (let i = 1; i < maxPage; i ) {
allLinks.push(
axios.get(url i "/").then(urlResponse => {
let $ = cheerio.load(urlResponse.data);
$("div.main-posts").each((i, element) => {
let link = $(element)
.find("div#entry-pic").find("a").get().map(x => $(x).attr('href'))
collection.push(link);
console.log(collection);
});
})
.catch((reason: AxiosError) => {
if (reason.response!.status == 404) {
//Need to break
}
})
)
}
Promise.all(allLinks).then(() => console.log(collection));
}
I already tried to exit the for loop with break, but then i get "Jump target cannot cross function boundary.". A while loop was also not an option because it seems like it breaks the axios.get function.
CodePudding user response:
With your current code, you can't, because your for
loop finishes before any of the axios calls completes. Your code starts all the calls (without waiting), then waits for them all to complete.
If you want to do them one at a time, in sequence, then the easiest thing is to make your function an async
function and await
each result:
async getMaxPageAmount(url: any) {
let maxPage = 600;
let collection = [] as any; // *** Best to avoid using `any`
for (let i = 1; i < maxPage; i ) {
try {
const urlResponse = await axios.get(url i "/");
let $ = cheerio.load(urlResponse.data);
$("div.main-posts").each((i, element) => {
let link = $(element)
.find("div#entry-pic")
.find("a")
.get()
.map((x) => $(x).attr("href"));
collection.push(link);
// console.log(collection);
});
} catch (reason: any) {
if (reason.response!.status == 404) {
break; // *** This will break the `for` loop
}
}
}
console.log(collection);
}
Alternatively, you could do all the calls in parallel, but then when processing the results, throw away the results after an error. To do that, we'll want to assign to collection[i - 1]
rather than using push
so that the results are in the same order as the loop, store markers for 404s in the collection, and then later find the first marker and ignore everything after it (while removing the gaps that may have been left by non-404 errors).
getMaxPageAmount(url: any) {
let maxPage = 600;
let allLinks = [] as any;
let collection = [] as any;
for (let i = 1; i < maxPage; i ) {
allLinks.push(
axios
.get(url i "/")
.then((urlResponse) => {
let $ = cheerio.load(urlResponse.data);
$("div.main-posts").each((i, element) => {
let link = $(element)
.find("div#entry-pic")
.find("a")
.get()
.map((x) => $(x).attr("href"));
collection[i - 1] = link;
console.log(collection);
});
})
.catch((reason: AxiosError) => {
if (reason.response!.status == 404) {
collection[i - 1] = 404; // Our marker
}
})
);
}
Promise.all(allLinks).then(() => {
// Find the first marker
const ignoreFrom = collection.indexOf(404);
// If found, remove all subsequent elements
if (ignoreFrom !== -1) {
collection.length = ignoreFrom;
}
// Non-404 errors will have left gaps in the array, so we
// filter those out. `filter` only calls our callback for
// the non-gaps, so we just always return `true`:
collection = collection.filter(() => true);
});
}