Home > database >  Puppeteer element selection returning null or timing out
Puppeteer element selection returning null or timing out

Time:07-06

I am trying to use puppeteer to extract the innerHTML value from a button on a webpage. For now, I am simply trying to await the appearance of the selector to allow me to then work with it.

On running the below code the program times out waiting.

const puppeteer = require("puppeteer");

const link =
  "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";

async function configureBrowser() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(link);

  return page;
}

async function findFee(page) {
  await page.reload({ waitUntil: ["networkidle0", "domcontentloaded"] });
  await page.waitForSelector("#txfeebutton");
  console.log("boom");
}

const setup = async () => {
  const page = await configureBrowser();
  await findFee(page);
  await browser.close();
};

setup();

As you can see below, the element definitely exists in the HTML:

HTML evidence

Console output:

enter image description here

CodePudding user response:

It works fine with a user agent string:

const puppeteer = require("puppeteer"); // ^14.3.0

let browser;
(async () => {
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
  await page.setExtraHTTPHeaders({"Accept-Language": "en-US,en;q=0.9"});
  await page.setUserAgent(ua);
  const url = "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";
  await page.goto(url);
  const btn = await page.waitForSelector("#txfeebutton");
  console.log(await btn.evaluate(el => el.textContent.trim())); // => ($0.56)
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

One debugging strategy for this trying the same script with headless: false and seeing if that works, then checking page.content() when running headlessly. You can see Cloudflare is detecting your scraper and presenting a captcha.

Related:

As an aside, configureBrowser leaks a reference to the browser object, so you'll never be able to call browser.close() and gracefully terminate the process. I recommend the above boilerplate and avoiding writing premature abstractions.

  • Related