Home > Enterprise >  Random slowdowns in node.js execution
Random slowdowns in node.js execution

Time:08-19

I have an optimization algorithm written in node.js that uses cpu time (measured with performance.now()) as a heuristic.

However, I noticed that occasionally some trivial lines of code would cost much more than usual.

So I wrote a test program:

const timings = [];
while (true) {
  const start = performance.now();
  // can add any trivial line of code here, or just nothing
  const end = performance.now();
  const dur = end - start;
  if (dur > 1) {
    throw [
      "dur > 1",
      {
        start,
        end,
        dur,
        timings,
        avg: _.mean(timings),
        max: _.max(timings),
        min: _.min(timings),
        last: timings.slice(-10),
      },
    ];
  }
  timings.push(dur);
}

The measurements showed an average of 0.00003ms and a peak >1ms (with the second highest <1ms but same order of magnitude).

The possible reasons I can think of are:

  1. the average timing isn't the actual time for executing the code (some compiler optimization)
  2. performance.now isn't accurate somehow
  3. cpu scheduling related - process wasn't running normally but still counted in performance.now
  4. occasionally node is doing something extra behind the scenes (GC etc)
  5. something happening on the hardware/os level - caching / page faults etc

Is any of these a likely reason, or is it something else?

Whichever the cause is, is there a way to make a more accurate measurement for the algorithm to use?

The outliers are current causing the algorithm to misbehave & without knowing how to resolve this issue the best option is to use the moving average cost as a heuristic but has its downsides.

Thanks in advance!

------- Edit

I appreciate how performance.now() will never be accurate, but was a bit surprised that it could span 3-4 orders of magnitude (as opposed to 2 orders of magnitude or ideally 1.)

Would anyone have any idea/pointers as to how performance.now() works and thus what's likely the major contributor to the error range?

It'd be nice to know if the cause is due to something node/v8 doesn't have control over (hardware/os level) vs something it does have control over (a node bug/options/gc related), so I can decide whether there's a way to reduce the error range before considering other tradeoffs with using an alternative heuristic.

------- Edit 2

Thanks to @jfriend00 I now realize performance.now() doesn't measure the actual CPU time the node process executed, but just the time since when the process started.

The question now is

  1. if there's an existing way to get actual CPU time
  2. is this a feature request for node/v8
  3. unless the node process doesn't have enough information from the OS to provide this

CodePudding user response:

You're unlikely to be able to accurately measure the time for one trivial line of code. In fact, the overhead in executing performance.now() is probably many times higher than the time to execute one trivial line of code. You have to be careful that what you're measuring takes substantially longer to execute than the uncertainty or overhead of the measurement itself. Measuring very small executions times is not going to be an accurate endeavor.

1,3 and 5 in your list are also all possibilities. You aren't guaranteed that your code gets a dedicated CPU core that is never interrupted to service some other thread in the system. In my Windows system, even when my nodejs is the only "app" running, there are hundreds of other threads devoted to various OS services that may or may not request some time to run while my nodejs app is running and eventually get some time slice of the CPU core my nodejs app was using.

And, as best I know, performance.now() is just getting a high resolution timer from the OS that's relative to some epoch time. It has no idea when your thread is and isn't running on a CPU core and wouldn't have any way to adjust for that. It just gets a high resolution timestamp which you can compare to some other high resolution timestamp. The time elapsed is not CPU time for your thread. It's just clock time elapsed.

CodePudding user response:

Is any of these a likely reason, or is it something else?

Yes, they all sound likely.

is there a way to make a more accurate measurement for the algorithm to use?

No, sub-millisecond time measurements are generally not reliable, and almost never a good idea. (Doesn't matter whether a timing API promises micro/nanosecond precision or whatever; chances are that (1) it doesn't hold up in practice, and (2) trying to rely on it creates more problems than it solves. You've just found an example of that.)

Even measuring milliseconds is fraught with peril. I once investigated a case of surprising performance, where it turned out that on that particular combination of hardware and OS, after 16ms of full load the CPU ~tripled its clock rate, which of course had nothing to do with the code that appeared to behave weirdly.

  • Related