I'm try using WebScraping to extract some table from a company website using puppeteer.
But I don't understand why the browser open Chromium instead my default Chrome, which than lead to "TimeoutError: Navigation timeout of 30000 ms exceeded", not let me enough time to use CSS Selector. I don't see any document about this.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage()
await page.goto('https://www....com');
//search tearm
await page.type("#search_term","Brazil");
//await page.screenshot({path: 'sc2.png'});
//await browser.close();
})();
CodePudding user response:
Puppeteer, is chromium based by default.
If you wish to use chrome instead you have to specify the executable path through the executablePath
launch parameter. But to be honest, most of the time, there is no point to do so.
let browser = await puppeteer.launch({
executablePath: `/path/to/Chrome`,
//...
});
There is no correlation between TimeoutError: Navigation timeout of 30000 ms exceeded
and the use chromium rather it is more likely that your target url isn't (yet) available.
page.goto will throw an error if:
- there's an SSL error (e.g. in case of self-signed certificates).
- target URL is invalid.
- the timeout is exceeded during navigation.
- the remote server does not respond or is unreachable.
- the main resource failed to load.
By default, the maximum navigation timeout is 30 seconds. If for some reason, your target url requires more time to load (which seems unlikely), you can specify a timeout: 0
option.
await page.goto(`https://github.com/`, {timeout: 0});
As Puppeteer will not throw an error when an HTTP status code is returned...
page.goto will not throw an error when any valid HTTP status code is returned by the remote server, including 404 "Not Found" and 500 "Internal Server Error".
I usually check the HTTP response status codes to make sure I'm not encountering any 404 Client error responses Bad Request.
let status = await page.goto(`https://github.com/`);
status = status.status();
if (status != 404) {
console.log(`Probably HTTP response status code 200 OK.`);
//...
};
I'm flying blind here as I don't have your target url nor more information on what you're trying to accomplish.
You should also give the GitHub api documentation a read.