Here is my code where I have got the element Handle of some target divs
const puppeteer = require("puppeteer");
(async () => {
const searchString = `https://www.google.com/maps/search/restaurants/@-6.4775265,112.057849,3.67z`;
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(searchString);
const xpath_expression ='//div[contains(@aria-label, "Results for")]/div/div[./a]';
await page.waitForXPath(xpath_expression);
const targetDivs = await page.$x(xpath_expression);
// const link_urls = await page.evaluate((...targetDivs) => {
// return targetDivs.map((e) => {
// return e.textContent;
// });
// }, ...targetDivs);
})();
I have two relative XPath links inside these target Divs which contain related data
'link' : './a/@href'
'title': './a/@aria-label'
I have a sample of similar python code like this
from parsel import Selector
response = Selector(page_content)
results = []
for el in response.xpath('//div[contains(@aria-label, "Results for")]/div/div[./a]'):
results.append({
'link': el.xpath('./a/@href').extract_first(''),
'title': el.xpath('./a/@aria-label').extract_first('')
})
How to do it in puppeteer?
CodePudding user response:
I think you can get the href
and ariaLabel
property values with e.g.
const targetDivs = await page.$x(xpath_expression);
targetDivs.forEach(async (div, pos) => {
const links = await div.$x('a[@href]');
const href = await (await links[0].getProperty('href')).jsonValue();
const ariaLabel = await (await links[0].getProperty('ariaLabel')).jsonValue();
console.log(pos, href, ariaLabel);
});
These are the element properties, not the attribute values, which, in the case of href
, might for instance mean you get an absolute instead of a relative URL but I haven't checked for that particular page whether it makes a difference. I am not sure the $x
allows direct attribute node or even string value selection, the documentation only talks about element handles.