I have a web scrapper that checks a URL and screenshots the page for me, I want an interval between every time the URL is being checked as the website is sensitive to too many requests per minute. Here is my code I would appreciate any help as I am totally stuck with it:
const puppeteer = require('puppeteer');
process.setMaxListeners(Infinity)
const fs = require('fs');
const csv = require('csv-parser');
const { setTimeout } = require('timers/promises');
var inputFile = 'app_ids.csv';
fs.createReadStream(inputFile)
.pipe(csv())
.on('data', function (data) {
try {
// console.log(data.app_id);
(async () => {
let app_id = data.app_id
console.log(app_id)
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://evisatraveller.mfa.ir/fa/request/state/' app_id,{ waitUntil: 'load', timeout: 0 });
await page.screenshot({ path: 'output/' app_id '.png' });
await browser.close();
})();
}
catch (err) {
console.log('Error:', err)
//error handler
}
})
.on('end', function () {
//some final operation
});
and if I set timeout
to any number it doesn't work, I assume using interval is the only way but I don't know where to implement it.
CodePudding user response:
You can add this below function at end of your code.
function blockingWait(seconds) {
var waitTill = new Date(new Date().getTime() seconds * 1000);
while(waitTill > new Date()){}
}
Then add blockingWait(15);
after try-catch
block
So there will be 15
a second wait time before the browser closes or after the browser closes.
Change the 15
to whatever your number choice is.
CodePudding user response:
It seems that when reading any CSV you wont be able to wait. Waiting === block main thread that would spike the memory.
on('data', function (data) {
//
})