So, I am using Puppeteer (a headless browser) to scrape through a website, and when I access that url, how can I load jQuery to use it inside my page.evaluate() function.
All I have now is a .js file and I'm running the code below. It goes to my URL as intended until I get an error on page.evaluate() since it seems like it's not loading the jQuery as I thought it would from the code on line 7: await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'})
Any ideas how I can load jQuery correctly here, so that I can use jQuery inside my page.evaluate() function?
(async() => {
let url = "[website url I'm scraping]"
let browser = await puppeteer.launch({headless:false});
let page = await browser.newPage();
await page.goto(url, {waitUntil: 'networkidle2'});
// code below doesn't seem to load jQuery, since I get an error in page.evaluate()
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'})
await page.evaluate( () => {
// want to use jQuery here to do access DOM
var classes = $( "td:contains('Lec')")
classes = classes.not('.Comments')
classes = classes.not('.Pct100')
classes = Array.from(classes)
});
})();
CodePudding user response:
You are on the right path.
Also I don't see any jQuery code being used in your evaluate
function.
There is no document.getElement
function.
The best way would to be to add a local copy of jQuery to avoid any cross origin errors.
More details can be found in the already answered question
So the jquery code is definitely working.
Also check if the host website doesn't have a jQuery instance already. In that case you would need to use jquery noConflict
$.noConflict();
CodePudding user response:
Fixed it!
I realized I forgot to include the code where I did some extra navigation clicks after going to my initial URL, so the problem was from adding the script tag to my initial URL instead of after navigating to my final destination URL.
I also needed to use
await page.waitForNavigation({waitUntil: 'networkidle2'})
before adding the script tag so that the page was fully loaded before adding the script.