Home > database >  clock_t() overflow on 32-bit machine
clock_t() overflow on 32-bit machine

Time:11-17

For statistical purposes I want to accumulate the whole CPU-time used for a function of a program, in microseconds. It must work in two systems, one where sizeof(clock_t) = 8 (RedHat) and another one where sizeof(clock_t) = 4 (AIX). In both machines clock_t is a signed integer type and CLOCKS_PER_SEC = 1000000 (= one microsecond, but I don't do such assumption in code and use the macro instead).

What I have is equivalent to something like this (but encapsulated in some fancy classes):

typedef unsigned long long u64;
u64 accum_ticks = 0;

void f()
{
   clock_t beg = clock();
   work();
   clock_t end = clock();

   accum_ticks  = (u64)(end - beg); // (1)
}

u64 elapsed_CPU_us()
{
   return accum_tick * 1e 6 / CLOCKS_PER_SEC;
}

But, in the 32-bit AIX machine where clock_t is an int, it will overflow after 35m47s. Suppose that in some call beg equals 35m43s since the program started, and work() takes 10 CPU-seconds, causing end to overflow. Can I trust line (1) for this and subsequental calls to f() from now on? f() is guaranteed to never take more than 35 minutes of execution, of course.

In case I can't trust line (1) at all even in my particular machine, what alternatives do I have that doesn't imply importing any third-party library? (I can't copy-paste libraries to the system and I can't use <chrono> because in our AIX machines it isn't available).

NOTE: I can use kernel headers and the precision I need is in microseconds.

CodePudding user response:

An alternate suggestion: Don't use clock. It's so underspecified it's nigh impossible to write code that will work fully portably, handling possible wraparound for 32 bit integer clock_t, integer vs. floating point clock_t, etc. (and by the time you write it, you've written so much ugliness you've lost whatever simplicity clock provided).

Instead, use getrusage. It's not perfect, and it might do a little more than you strictly need, but:

  1. The times it returns are guaranteed to operate relative to 0 (where the value returned by clock at the beginning of a program could be anything)
  2. It lets you specify if you want to include stats from child processes you've waited on (clock either does or doesn't, in a non-portable fashion)
  3. It separates the user and system CPU times; you can use either one, or both, your choice
  4. Each time is expressed explicitly in terms of a pair of values, a time_t number of seconds, and a suseconds_t number of additional microseconds. Since it doesn't try to encode a total microsecond count into a single time_t/clock_t (which might be 32 bits), wraparound can't occur until you've hit at least 68 years of CPU time (if you manage that, on a system with 32 bit time_t, I want to know your IT folks; only way I can imagine hitting that is on a system with hundreds of cores, running weeks, and any such system would be 64 bit at this point).
  5. The parts of the result you need are specified by POSIX, so it's portable to just about everywhere but Windows (where you're stuck writing preprocessor controlled code to switch to GetProcessTimes when compiled for Windows)

Conveniently, since you're on POSIX systems (I think?), clock is already expressed as microseconds, not real ticks (POSIX specifies that CLOCKS_PER_SEC equals 1000000), so the values already align. You can just rewrite your function as:

#include <sys/time.h>
#include <sys/resource.h>

static inline u64 elapsed(const struct timeval *beg, const struct timeval *end)
{
    return (end->tv_sec - beg->tv_sec) * 1000000ULL   end->tv_usec - beg->tv_usec;
}

void f()
{
   struct rusage beg, end;
   // Not checking return codes, because only two documented failure cases are passing
   // an unmapped memory address for the struct addr or an invalid who flag, neither of which
   // we're doing, easily verified by inspection
   getrusage(RUSAGE_SELF, &beg);
   work();
   getrusage(RUSAGE_SELF, &end);

   accum_ticks  = elapsed(&beg.ru_utime, &end.ru_utime);
   // And if you want to include system time as well, add:
   accum_ticks  = elapsed(&beg.ru_stime, &end.ru_stime);
}

u64 elapsed_CPU_us()
{
   return accum_ticks; // It's already stored natively in microseconds
}

On Linux 2.6.26 , you can replace RUSAGE_SELF with RUSAGE_THREAD to limit to the resources used solely by the calling thread alone, not just the calling process (which might help if other threads are doing unrelated work and you don't want their stats polluting yours), in exchange for less portability.

Yes, it's a little more work to compute the time (two additions/subtractions, one multiplications by a constant, doubled if you want both user and system time, where clock in the simplest usage is a single subtraction), but:

  1. Handling clock wraparound adds more work (and branches work, which this code doesn't have; admittedly, it's a fairly predictable branch), narrowing the gap
  2. Integer multiplication is roughly as cheap as addition and subtraction on modern chips (the latest x86-64 chips perform integer multiply in a single clock cycle), so you're not adding orders of magnitude more work, and in exchange, you get more control, more guarantees, and greater portability

CodePudding user response:

unsigned long long(end - beg) subtracts using clock_t math which is more likely to overflow than 64-bit math.

Suggest using long long math in the subtraction.

//unsigned long long accum_ticks = 0;
//...
//accum_ticks  = unsigned long long(end - beg);

long long accum_ticks = 0;
...
accum_ticks  = 0LL   end - beg;

To cope with clock_t sometimes wrapping around, we need to determine a CLOCK_MAX that works for a signed or unsigned clock_t. Note that clock_t may be a FP and the below approach is problematic.

#define CLOCK_MAX _Generic(((clock_t) 0), \
  unsigned long: ULONG_MAX/2, \
  long: LONG_MAX, \
  unsigned: UINT_MAX/2, \
  int: INT_MAX, \
  unsigned short: USHRT_MAX/2, \
  short: SHRT_MAX \
  )


long long accum_ticks = 0;
...
long long diff = 0LL   end - beg;
if (diff < 0) {
  diff  = 1LL   CLOCK_MAX   CLOCK_MAX;
}  
accum_ticks  = diff;

This works if the interval between calls is less than or equal to 1 "wrap".

  • Related