I am attempting to scrape data from https://monkeytype.com/ like this:
const wordsSelector = await page.waitForSelector('#words');
console.log(await wordsSelector.$eval('div', words => words.innerHTML));
This works as expected returning the following (this string spells interest):
<letter>i</letter><letter>n</letter><letter>t</letter><letter>e</letter><letter>r</letter><letter>e</letter><letter>s</letter><letter>t</letter>
However there are multiple divs with similar innerHTML that I would like to scrape from #words
I was under the impression that simply replacing .$eval
with .$$eval
was the correct way to do this, however when I make the change like this:
const wordsSelector = await page.waitForSelector('#words');
console.log(await wordsSelector.$$eval('div', words => words.innerHTML));
words becomes an empty array and words.innerHTML is obviously undefined.
Am I using .$$eval
incorrectly?
CodePudding user response:
When using .$$eval()
, the array of elements passed to the callback is typically mapped over:
const puppeteer = require("puppeteer"); // ^19.0.0
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
const url = "https://monkeytype.com/";
await page.goto(url, {waitUntil: "domcontentloaded"});
const $ = (...args) => page.waitForSelector(...args);
await (await $(".rejectAll")).click();
await $("#words .word.active");
const wordsEl = await $("#words");
const words = await wordsEl.$$eval(".word", els =>
els.map(el => el.innerHTML)
);
console.log(words);
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
I'm not sure the HTML is that useful here, though. I'd suggest .textContent
. If your goal is to type the words into the input, you might try something like:
const puppeteer = require("puppeteer"); // ^19.0.0
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
const url = "https://monkeytype.com/";
await page.setRequestInterception(true);
const allowed = [
"https://monkeytype.com",
"https://www.monkeytype.com",
"https://api.monkeytype.com",
"https://fonts.google",
];
page.on("request", request => {
if (allowed.some(e => request.url().startsWith(e))) {
request.continue();
}
else {
request.abort();
}
});
await page.goto(url, {waitUntil: "domcontentloaded"});
const $ = (...args) => page.waitForSelector(...args);
await (await $(".rejectAll")).click();
await $("#words .word.active");
const words = await page.$("#words");
try {
for (;;) {
const word = await words.$eval(".word.active", el =>
el.textContent.trim()
);
await words.type(word " ");
}
}
catch (err) {}
const results = await $("#result");
await results.evaluate(el => el.scrollIntoView());
await results.screenshot({path: "typing-results.png"});
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Optimization is left as an exercise here. See also this answer for another typing test, which is a harder page to automate for various subtle reasons.