I am having a problem where I can only see the contents of my output array vars from inside of my main Puppeteer function scrapeEbay().
Is this because this function is asynchronous? My intent is to have the main function be able to see the contents of the output arrays; to return them. But they will show up empty, unless log those arrays inside of the scrapeEbay() function.
Code below, just curious as to what the reasoning is behind this?
var outputPrices = [];
var outputItems = [];
async function scrapeEbay(inputString) {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// Open advanced search on ebay
await page.goto("https://ebay.com/sch/ebayadvsearch");
await page.type("#_nkw", inputString);
// Make Selections for conditon and sold status
await page.click("#LH_Sold");
await page.click("#LH_ItemConditionUsed");
// Get rid of international listings
await page.click("#LH_LocatedInRadio");
// submit ebay advanced search form
await page.click("#searchBTNLowerLnk");
await page.waitForSelector("span.s-item__price");
const scrapedPrices = await page.$$eval("span.s-item__price", (spans) => {
return [...spans].slice(1).map((span) => {
// https://stackoverflow.com/a/42309034
// Slice this string to get the desired pricing; instead of the daughter <span /> tag
var slicedNumber = span.innerHTML.slice(24, -7);
// remove Commas from numbers; to not mess with the parseFloat function below
var splitNumber = slicedNumber.replace(",", "");
var price = parseFloat(splitNumber);
return price;
});
});
outputItems.push(inputString);
outputPrices.push(median(scrapedPrices));
await browser.close();
console.log([outputItems], [outputPrices]);
}
async function main(inputArray) {
for (let i = 0; i < inputArray.length; i ) {
scrapeEbay(inputArray[i]);
}
console.log([outputItems], [outputPrices]);
}
main(["i7 6700k", "gtx 970"]);
CodePudding user response:
Fix:
change this scrapeEbay(inputArray[i]);
to
await scrapeEbay(inputArray[i]);
Explanation:
When your main function is triggered, since scrapeEbay
is an async function, the computation thread is not going to complete it and then continue the for loop to run it again and again after that jump to the next line. What happens is starts the function but since it's an async function, whenever it reaches the await keyword, it continues computing the next thing in its call stack which is the next scrapeEbay
call in the for loop. after finishing all these executions, it goes to the console.log
and the awaited lines continue only after they are ready. So when the console.log
is getting executed scrapeEbay
has not finished yet. By putting await before scrapeEbay
we are waiting for every call of this function to be finished completely, and then moving to the next line which is the console.log
.
Suggestion:
Consider using Promise.all
if you plan on using a larger array. That way all calls of the target function run at the same time which makes your code much faster.