Home > Blockchain >  How to set interval for checking specific URL several times in web scrapper nodejs - puppetteer
How to set interval for checking specific URL several times in web scrapper nodejs - puppetteer

Time:04-08

I have a web scrapper that checks a URL and screenshots the page for me, I want an interval between every time the URL is being checked as the website is sensitive to too many requests per minute. Here is my code I would appreciate any help as I am totally stuck with it:

const puppeteer = require('puppeteer');
process.setMaxListeners(Infinity)
const fs = require('fs');
const csv = require('csv-parser');
const { setTimeout } = require('timers/promises');

var inputFile = 'app_ids.csv';
fs.createReadStream(inputFile)
.pipe(csv())
.on('data', function (data) {
    try {
        // console.log(data.app_id);
        (async () => {
            let app_id = data.app_id
            console.log(app_id)
            const browser = await puppeteer.launch();
            const page = await browser.newPage();
            

            await page.goto('https://evisatraveller.mfa.ir/fa/request/state/'   app_id,{ waitUntil: 'load', timeout: 0 });
            await page.screenshot({ path: 'output/'   app_id   '.png' });


            await browser.close();

    
        })();
    }
    catch (err) {
        console.log('Error:', err)
        //error handler
    }
})
.on('end', function () {
    //some final operation
});

and if I set timeout to any number it doesn't work, I assume using interval is the only way but I don't know where to implement it.

CodePudding user response:

You can add this below function at end of your code.

function blockingWait(seconds) {
    var waitTill = new Date(new Date().getTime()   seconds * 1000);
    while(waitTill > new Date()){}
}

Then add blockingWait(15); after try-catch block

So there will be 15 a second wait time before the browser closes or after the browser closes.

Change the 15 to whatever your number choice is.

CodePudding user response:

It seems that when reading any CSV you wont be able to wait. Waiting === block main thread that would spike the memory.

on('data', function (data) {
//
 
})
  • Related