I want to web scrap a site but the problem with that site is this, content in the site load in 1s but the loader in the navbar kept loading for 30 to 1min so my puppeteer scrapper kept waiting for the loader in the navbar to stop?
Is there any way to run window.stop()
after a certain timeout
const checkBook = async () => {
await page.goto(`https://wattpad.com/story/${bookid}`, {
waitUntil: 'domcontentloaded',
});
const is404 = await page.$('#story-404-wrapper');
if (is404) {
socket.emit('error', {
message: 'Story not found',
});
await browser.close();
return {
error: true,
};
}
storyLastUpdated = await page
.$eval(
'.table-of-contents__last-updated strong',
(ele: any) => ele.textContent,
)
.then((date: string) => getDate(date));
};
CodePudding user response:
You could strip the
waitUntil: 'domcontentloaded',
in favor of a timeout as documented here https://github.com/puppeteer/puppeteer/blob/v14.1.0/docs/api.md#pagegotourl-options
or set the timeout to zero and instead use one of the page.waitFor...
like this
await page.waitForTimeout(30000);
CodePudding user response:
Similar approach to Marcel's answer. The following will do the job:
page.goto(url)
await page.waitForTimeout(1000)
await page.evaluate(() => window.stop())
// your scraper script goes here
await browser.close()
Notes:
page.goto()
is NOT awaited, so you save time compared to waiting untilDOMContentLoaded
orLoad
events...- ...if the
goto
was not awaited you need to make sure your script can start the work with the DOM. You can either usepage.waitForTimeout()
orpage.waitForSelector()
. - you have to execute
window.stop()
withinpage.evaluate()
so you can avoid this kind of error:Error: Navigation failed because browser has disconnected!