I'm trying to grab products from ebay and open them on amazon.
So far, I have them being searched on amazon but I'm struggling with getting the products selected from the search results.
Currently its outputting a blank array and im not sure why. Have tested in a separate script without the grabTitles and the for loop. So im guessing there is something in that causing an issue.
Is there something i am missing here thats preventing the data coming back for prodResults?
const puppeteer = require('puppeteer');
const URL = "https://www.amazon.co.uk/";
const selectors = {
searchBox: '#twotabsearchtextbox',
productLinks: 'span.a-size-base-plus.a-color-base.a-text-normal',
productTitle: '#productTitle'
};
(async() => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://www.ebay.co.uk/sch/jmp_supplies/m.html?_trkparms=folent:jmp_supplies|folenttp:1&rt=nc&_trksid=p2046732.m1684');
//Get product titles from ebay
const grabTitles = await page.evaluate(() => {
const itemTitles = document.querySelectorAll('#e1-11 > #ResultSetItems > #ListViewInner > li > .lvtitle > .vip');
var items = []
itemTitles.forEach((tag) => {
items.push(tag.innerText)
})
return items
})
//Search for the products on amazon in a new tab for each product
for (i = 0; i < grabTitles.length; i ) {
const page = await browser.newPage();
await page.goto(URL)
await page.type(selectors.searchBox, grabTitles[i ])
await page.keyboard.press('Enter');
//get product titles from amazon search results
const prodResults = await page.evaluate(() => {
const prodTitles = document.querySelectorAll('span.a-size-medium.a-color-base.a-text-normal');
let results = []
prodTitles.forEach((tag) => {
results.push(tag.innerText)
})
return results
})
console.log(prodResults)
}
})()
CodePudding user response:
You've hit on an age old problem with Puppeteer and knowing when a page has fully completed rendering or loading.
You could try adding the following:
await page.waitForNavigation({ waitUntil: 'networkidle2' })
await page.waitForTimeout(10000)
Usually I find networkidle2
isn't always reliable enough so I add an arbitrary extra waitForTimeout
. You'll need to play around with the timeout value (10000 = 10 seconds) to get what you're looking for, not ideal I know but I've not found a better way.
CodePudding user response:
There are a few potential problems with the script:
await page.keyboard.press('Enter');
triggers a navigation, but you never wait for it before trying to select the result elements. UsewaitForNavigation
,waitForSelector
orwaitForFunction
(notwaitForTimeout
).If you do wait for a navigation, there's a special pattern using
Promise.all
needed to avoid a race condition, shown in the docs.Furthermore, you might be able to skip a page load by going directly to the search URL by building the string yourself.
Your code spawns a new page for every item that needs to be processed, but these pages are never closed. I see
grabTitles.length
as 60. So you'll be opening 60 tabs. That's a lot of resources being wasted. On my machine, it'd probably hang everything. Just make one page and navigate it repeatedly, or close each page when you're done. If you want parallelism, consider a task queue or run a few pages simultaneously.grabTitles[i ]
-- why incrementi
here? It's already incremented by the loop, so this appears to skip elements, unless your selectors have duplicates or you have some other reason to do this.span.a-size-medium
doesn't work for me, which could be locality-specific. I seea span.a-size-base-plus.a-color-base.a-text-normal
, but you may need to tweak this to taste.
Here's a minimal example. I'll just do the first 2 items from the eBay array since that's coming through fine.
const puppeteer = require("puppeteer"); // ^13.5.1
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
const titles = [
"Chloraethyl | Dr. Henning | Spray 175 ml",
"Elmex Decays Prevention Toothpaste 2 x 75ml",
];
for (const title of titles) {
await page.goto("https://www.amazon.co.uk/");
await page.type("#twotabsearchtextbox", title);
await Promise.all([
page.keyboard.press("Enter"),
page.waitForNavigation(),
]);
const titleSel = "a span.a-size-base-plus.a-color-base.a-text-normal";
await page.waitForSelector(titleSel);
const results = await page.$$eval(titleSel, els =>
els.map(el => el.textContent)
);
console.log(title, results.slice(0, 5));
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Output:
Chloraethyl | Dr. Henning | Spray 175 ml [
'Chloraethyl | Dr. Henning | Spray 175 ml',
'Wild Fire (Shetland)',
'A Dark Sin: A chilling British detective crime thriller (The Hidden Norfolk Murder Mystery Series Book 8)',
'A POLICE DOCTOR INVESTIGATES: the Sussex murder mysteries (books 1-3)',
'Rites of Spring: Sunday Times Crime Book of the Month (Seasons Quartet)'
]
Elmex Decays Prevention Toothpaste 2 x 75ml [
'Janina Ultra White Whitening Toothpaste (75ml) – Diamond Formula. Extra Strength. Clinically Proven. Low Abrasion. For Everyday Use. Excellent for Stain Removal',
'Elmex Decays Prevention Toothpaste 2 x 75ml',
'Elmex Decays Prevention Toothpaste 2 x 75ml by Elmex',
'Elmex Junior Toothpaste 2 x 75ml',
'Elmex Sensitive Professional 2 x 75ml'
]