Home > Software design >  How to perform recurring long running background tasks in an node.js web server
How to perform recurring long running background tasks in an node.js web server

Time:12-16

I'm working on a node.js web server using express.js that should offer a dashboard to monitor database servers.

The architecture is quite simple:

  • a gatherer retrieves the information in a predefined interval and stores the data
  • express.js listens to user requests and shows a dashboard based on the stored data

I'm now wondering how to best implement the gatherer to make sure that it does not block the main loop and the simplest solution seems be to just use a setTimeout based approach but I was wondering what the "proper way" to architecture this would be?

CodePudding user response:

Your concern is your information-gathering step. It probably is not as CPU-intensive as it seems. Because it's a monitoring app, it probably gathers information by contacting other machines, something like this.

async function  gather () {
    const results = []
    let result
    result = await getOracleMetrics ('server1')
    results.push(result)
    result = await getMySQLMetrics ('server2')
    results.push(result)
    result = await getMySQLMetrics ('server3')
    results.push(result)
    await storeMetrics(results)
}

This is not a cpu-intensive function. (If you were doing a fast Fourier transform on an image, that would be a cpu-intensive function.)

It spends most of its time awaiting results, and then a little time storing them. Using async / await gives you the illusion it runs synchronously. But, each await yields the main loop to other things.

You might invoke it every minute something like this. The .then().catch() stuff invokes it asynchronously.

setInterval (
  function go () {
    gather()
      .then()
      .catch(console.error)
  }, 1000 * 60 * 60)

If you do actually have some cpu-intensive computation to do, you have a few choices.

  1. offload it to a worker thread.

  2. break it up into short chunks, with sleeps between them.

    sleep = function sleep (howLong) {
      return new Promise(function (resolve) {
        setTimeout(() => {resolve()}, howLong)
      })
    }
    
    async function gather () {
       for (let chunkNo = 0; chunkNo < 100; chunkNo  ) {
           doComputationChunk(chunkNo)
           await sleep(1)
       }
    }
    

That sleep() function yields to the main loop by waiting for a timeout to expire.

None of this is debugged, sorry to say.

CodePudding user response:

For recurring tasks I prefer to use node-scheduler and shedule the jobs on app start-up.

In case you don't want to run CPU-expensive tasks in the main-thread, you can always run the code below in a worker-thread in parallel instead of the main thread - see info here

Here are two examples, one with a recurrence rule and one with interval in minutes using a cron expression:

app.js

let mySheduler = require('./mysheduler.js');

mySheduler.sheduleRecurrence();

// And/Or

mySheduler.sheduleInterval();

mysheduler.js

/* INFO: Require node-schedule for starting jobs of sheduled-tasks */
var schedule = require('node-schedule');


/* INFO: Helper for constructing a cron-expression */
function getCronExpression(minutes) {
  if (minutes < 60) {
    return `*/${minutes} * * * *`;
  }
  else {
    let hours =  (minutes - minutes % 60) / 60;
    let minutesRemainder = minutes % 60;
    return `*/${minutesRemainder} */${hours} * * *`;
  }
}

module.exports = {

   sheduleRecurrence: () => {

      // Schedule a job @ 01:00 AM every day (Mo-Su)
      var rule = new schedule.RecurrenceRule();
      rule.hour = 01;
      rule.minute = 00;
      rule.second = 00;
      rule.dayOfWeek = new schedule.Range(0,6);
      var dailyJob = schedule.scheduleJob(rule, function(){
          /* INFO: Put your database-ops or other routines here */
          // ...
          // ..
          // .
      });
      // INFO: Verbose output to check if job was scheduled:
      console.log(`JOB:\n${dailyJob}\n HAS BEEN SCHEDULED..`);
   },

   sheduleInterval: () => {

      let intervalInMinutes = 60;
      let cronExpressions = getCronExpression(intervalInMinutes);
      // INFO: Define unique job-name in case you want to cancel it
      let uniqueJobName = "myIntervalJob"; // should be unique
      // INFO: Schedule the job
      var job = schedule.scheduleJob(uniqueJobName,cronExpressions, function() {
          /* INFO: Put your database-ops or other routines here */
          // ...
          // ..
          // .
      })
      // INFO: Verbose output to check if job was scheduled:
      console.log(`JOB:\n${job}\n HAS BEEN SCHEDULED..`);
   }

}

In case you want to cancel a job, you can use its unique job-name:

function cancelCronJob(uniqueJobName) {
  /* INFO: Get job-instance for canceling scheduled task/job */
  let current_job = schedule.scheduledJobs[uniqueJobName];
  if (!current_job || current_job == 'undefinded') {
    /* INFO: Cron-job not found (already cancelled or unknown) */
    console.log(`CRON JOB WITH UNIQUE NAME: '${uniqueJobName}' UNDEFINED OR ALREADY CANCELLED..`);
  }
  else {
    /* INFO: Cron-job found and cancelled */
    console.log(`CANCELLING CRON JOB WITH UNIQUE NAME: '${uniqueJobName}`)
    current_job.cancel();
  }
};

In my example the recurrence and the interval are hardcoded, obviously you can also pass the recurrence-rules or the interval as argument to the respective function..

As per your comment:

'When looking at the implementation of node-schedule it feels like a this layer on top of setTimeout..'

Actually, node-schedule is using long-timeout -> https://www.npmjs.com/package/long-timeout so you are right, it's basically a convenient layer on top of timeOuts

  • Related