Home > Software engineering >  Scraping with puppeteer not returning any data
Scraping with puppeteer not returning any data

Time:07-26

I'm doing some very basic scraping with puppeteer, as below:

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(
    "https://results.birmingham2022.com/#/athletic-sports-entries/SWM/*"
  );

  // Wait for it to do a bunch of background auth, api calls, etc.
  await sleep(45 * 1000);
  console.log("Finished sleeping.");

  console.log("Finding tables...");
  const eventsTables = await page.evaluate(() => {
    const tables = document.querySelectorAll(
      "app-athletics-sports-entries > div > div > div > table"
    );

    return tables.length;
  });
  console.log("Found events tables:", eventsTables);

  await browser.close();
})();

function sleep(ms) {
  return new Promise((resolve) => {
    console.log(`Waiting for ${ms / 1000} seconds...`);
    return setTimeout(resolve, ms);
  });
}

I've confirmed that the selector works fine by using DevTools. It returns an array length of 1764. The above code does too. If I change the above code to just return the tables variable though, it returns undefined.

  • I've tried Array.from(tables), still undefined.
  • I've tried iterating over tables.values(), pushing each item into a new array and returning the new array, still undefined.

I'm completely lost as to what I've doing wrong here?

CodePudding user response:

I think your issue lies in serialization of node elements. Your attempted solutions still try to return nodes from evaluate, it fails and you get undefined. More on the subject can be found here.

One workaround would be to modify your second attempted solution:

let elements = [];
for (let el of tables.values()) { 
   elements.push(el.outerHTML) 
}

and return elements from evaluate. I tried this solution and you get all the node content printed out in your console.log but that is quite a large amount of HTML, so maybe a better approach would be to refine what you are looking for and push only that to elements array. Just keep the elements of an array in String format.

  • Related