I'm trying to get all the links in the div "Notes et références" on this page:
https://fr.wikipedia.org/wiki/Barack_Obama
but it seems that I don't have the right selector. I tried this but it didn't work:
const scrollable_section = '#mw-content-text > div.mw-parser-output > div.reference-cadre'
await page.evaluate(selector => {
const element = document.querySelector(selector);
element.scrollTop = element.offsetHeight;
}, scrollable_section);
Can somebody help me?
I'm new to Puppeteer, so I might need some more explanation.
CodePudding user response:
Just because the element is scrollable doesn't mean you actually need to scroll to get the data. It's generally only for JS-driven, dynamic feeds that you need to mess with scrolling.
In this case, the data is available statically, so unless you're using Puppeteer for some other reason, you could just do this with a simpler and probably faster Axios/Cheerio combo.
Even better is to use Wikipedia's API rather than scraping the data. If you do scrape, please respect their limits for robots.
Continuing on with Puppeteer, Wikipedia has odd page structures that don't nest sections. After selecting #Notes_et_références
, you can pop up to the parent <h2>
, then iterate a couple of sibling nodes forward until you're at the .reference-cadre
element (I hardcoded this relationship, but you could make it more dynamic with a loop if being a bit more future-proof is a goal).
const puppeteer = require("puppeteer");
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
const url = "https://fr.wikipedia.org/wiki/Barack_Obama";
await page.goto(url);
const links = await page.evaluate(() =>
[...document.querySelector("#Notes_et_références")
.parentNode
.nextElementSibling
.nextElementSibling
.querySelectorAll("a")]
.map(e => e.getAttribute("href"))
);
console.log(links.length, links.slice(0, 5));
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Output:
809 [
'#cite_ref-prononciation_1-0',
'#cite_ref-prononciation_1-1',
'/wiki/Prononciation_de_l'anglais',
'/wiki/Anglais_américain',
'/wiki/Transcription_phonétique'
]