I'm using Puppeteer to scrape transaction history, but I'm getting trouble with mapping scraped data (innerText) inside element same as the JSON sample below.
How to map scraped data like this?
Expected JSON sample:
[
{
date: 'Today',
transactions: [
{ amount: '- 28,000' },
{ amount: ` 5,000` }
]
},
{
date: 'Yesterday',
transactions: [
{ amount: '-24,000' },
{ amount: `-141, 000` },
{ amount: ` 50,000` }
]
}
]
HTML:
<div >
<div >Today</div>
<div >
<div >
<div >
<div >
<div >Bobby Timmy</div>
<div >
Lorem ipsum dolor sit amet consectetur adipisicing elit. Rerum, distinctio?
</div>
</div>
<div >
<div >-28,000</div>
</div>
</div>
</div>
<div >
<div >
<div >
<div >John Doe</div>
<div >
Lorem ipsum dolor sit amet consectetur adipisicing elit. Rerum, distinctio?
</div>
</div>
<div >
<div > 5,000</div>
</div>
</div>
</div>
<div >
<div >
<div >
<div >Outgoing money</div>
<div >
Lorem ipsum dolor sit amet consectetur adipisicing elit. Rerum, distinctio?
</div>
</div>
<div >
<div >-5,000</div>
</div>
</div>
</div>
</div>
</div>
<div >
<div >Yesterday</div>
<div >
<div >
<div >
<div >
<div >Adam Crash</div>
<div >
Lorem ipsum dolor sit amet consectetur adipisicing elit. Rerum, distinctio?
</div>
</div>
<div >
<div >-24,000</div>
</div>
</div>
</div>
<div >
<div >
<div >
<div >Alexi Pattim</div>
<div >
Lorem ipsum dolor sit amet consectetur adipisicing elit. Rerum, distinctio?
</div>
</div>
<div >
<div >-141,000</div>
</div>
</div>
</div>
<div >
<div >
<div >
<div >McKenzy Smithy</div>
<div >
Lorem ipsum dolor sit amet consectetur adipisicing elit. Rerum, distinctio?
</div>
</div>
<div >
<div > 50,000</div>
</div>
</div>
</div>
</div>
</div>
My Puppeteer code:
await page.$$eval('.transaction-list-group', (nodes) => nodes.map(element => ({
date: element.querySelector(".transaction-list-group__header").innerText,
transactions: [
{
amount: element.querySelector(".transaction-amount__currency-amount").innerText
}
]
})
))
CodePudding user response:
When you want to query a singe element, use querySelector
. When you want to query multiple elements, use querySelectorAll
. Once you've created an array from the selection, you can map
over the results to grab each text content and transform it into the desired object:
const puppeteer = require("puppeteer"); // ^19.0.0
require("util").inspect.defaultOptions.depth = null;
const html = `<your HTML copied verbatim from original post>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const result = await page.$$eval(".transaction-list-group", els =>
els.map(el => ({
date: el.querySelector(".transaction-list-group__header")
.textContent.trim(),
transactions:
[...el.querySelectorAll(".transaction-amount__currency-amount")]
.map(el => ({amount: el.textContent.trim()}))
}))
);
console.log(result);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());