I'm doing some very basic scraping with puppeteer, as below:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(
"https://results.birmingham2022.com/#/athletic-sports-entries/SWM/*"
);
// Wait for it to do a bunch of background auth, api calls, etc.
await sleep(45 * 1000);
console.log("Finished sleeping.");
console.log("Finding tables...");
const eventsTables = await page.evaluate(() => {
const tables = document.querySelectorAll(
"app-athletics-sports-entries > div > div > div > table"
);
return tables.length;
});
console.log("Found events tables:", eventsTables);
await browser.close();
})();
function sleep(ms) {
return new Promise((resolve) => {
console.log(`Waiting for ${ms / 1000} seconds...`);
return setTimeout(resolve, ms);
});
}
I've confirmed that the selector works fine by using DevTools. It returns an array length of 1764. The above code does too. If I change the above code to just return the tables
variable though, it returns undefined.
- I've tried
Array.from(tables)
, still undefined. - I've tried iterating over
tables.values()
, pushing each item into a new array and returning the new array, still undefined.
I'm completely lost as to what I've doing wrong here?
CodePudding user response:
I think your issue lies in serialization of node elements. Your attempted solutions still try to return nodes from evaluate
, it fails and you get undefined
. More on the subject can be found here.
One workaround would be to modify your second attempted solution:
let elements = [];
for (let el of tables.values()) {
elements.push(el.outerHTML)
}
and return elements
from evaluate
. I tried this solution and you get all the node content printed out in your console.log
but that is quite a large amount of HTML, so maybe a better approach would be to refine what you are looking for and push only that to elements
array. Just keep the elements of an array in String format.