For statistical purposes I want to accumulate the whole CPU-time used for a function of a program, in microseconds. It must work in two systems, one where sizeof(clock_t) = 8
(RedHat) and another one where sizeof(clock_t) = 4
(AIX). In both machines clock_t
is a signed integer type and CLOCKS_PER_SEC = 1000000
(= one microsecond, but I don't do such assumption in code and use the macro instead).
What I have is equivalent to something like this (but encapsulated in some fancy classes):
typedef unsigned long long u64;
u64 accum_ticks = 0;
void f()
{
clock_t beg = clock();
work();
clock_t end = clock();
accum_ticks = (u64)(end - beg); // (1)
}
u64 elapsed_CPU_us()
{
return accum_tick * 1e 6 / CLOCKS_PER_SEC;
}
But, in the 32-bit AIX machine where clock_t
is an int
, it will overflow after 35m47s. Suppose that in some call beg
equals 35m43s since the program started, and work()
takes 10 CPU-seconds, causing end
to overflow. Can I trust line (1)
for this and subsequental calls to f()
from now on? f()
is guaranteed to never take more than 35 minutes of execution, of course.
In case I can't trust line (1)
at all even in my particular machine, what alternatives do I have that doesn't imply importing any third-party library? (I can't copy-paste libraries to the system and I can't use <chrono>
because in our AIX machines it isn't available).
NOTE: I can use kernel headers and the precision I need is in microseconds.
CodePudding user response:
An alternate suggestion: Don't use clock
. It's so underspecified it's nigh impossible to write code that will work fully portably, handling possible wraparound for 32 bit integer clock_t
, integer vs. floating point clock_t
, etc. (and by the time you write it, you've written so much ugliness you've lost whatever simplicity clock
provided).
Instead, use getrusage
. It's not perfect, and it might do a little more than you strictly need, but:
- The times it returns are guaranteed to operate relative to
0
(where the value returned byclock
at the beginning of a program could be anything) - It lets you specify if you want to include stats from child processes you've waited on (
clock
either does or doesn't, in a non-portable fashion) - It separates the user and system CPU times; you can use either one, or both, your choice
- Each time is expressed explicitly in terms of a pair of values, a
time_t
number of seconds, and asuseconds_t
number of additional microseconds. Since it doesn't try to encode a total microsecond count into a singletime_t
/clock_t
(which might be 32 bits), wraparound can't occur until you've hit at least 68 years of CPU time (if you manage that, on a system with 32 bittime_t
, I want to know your IT folks; only way I can imagine hitting that is on a system with hundreds of cores, running weeks, and any such system would be 64 bit at this point). - The parts of the result you need are specified by POSIX, so it's portable to just about everywhere but Windows (where you're stuck writing preprocessor controlled code to switch to
GetProcessTimes
when compiled for Windows)
Conveniently, since you're on POSIX systems (I think?), clock
is already expressed as microseconds, not real ticks (POSIX specifies that CLOCKS_PER_SEC
equals 1000000), so the values already align. You can just rewrite your function as:
#include <sys/time.h>
#include <sys/resource.h>
static inline u64 elapsed(const struct timeval *beg, const struct timeval *end)
{
return (end->tv_sec - beg->tv_sec) * 1000000ULL end->tv_usec - beg->tv_usec;
}
void f()
{
struct rusage beg, end;
// Not checking return codes, because only two documented failure cases are passing
// an unmapped memory address for the struct addr or an invalid who flag, neither of which
// we're doing, easily verified by inspection
getrusage(RUSAGE_SELF, &beg);
work();
getrusage(RUSAGE_SELF, &end);
accum_ticks = elapsed(&beg.ru_utime, &end.ru_utime);
// And if you want to include system time as well, add:
accum_ticks = elapsed(&beg.ru_stime, &end.ru_stime);
}
u64 elapsed_CPU_us()
{
return accum_ticks; // It's already stored natively in microseconds
}
On Linux 2.6.26 , you can replace RUSAGE_SELF
with RUSAGE_THREAD
to limit to the resources used solely by the calling thread alone, not just the calling process (which might help if other threads are doing unrelated work and you don't want their stats polluting yours), in exchange for less portability.
Yes, it's a little more work to compute the time (two additions/subtractions, one multiplications by a constant, doubled if you want both user and system time, where clock
in the simplest usage is a single subtraction), but:
- Handling
clock
wraparound adds more work (and branches work, which this code doesn't have; admittedly, it's a fairly predictable branch), narrowing the gap - Integer multiplication is roughly as cheap as addition and subtraction on modern chips (the latest x86-64 chips perform integer multiply in a single clock cycle), so you're not adding orders of magnitude more work, and in exchange, you get more control, more guarantees, and greater portability
CodePudding user response:
unsigned long long(end - beg)
subtracts using clock_t
math which is more likely to overflow than 64-bit math.
Suggest using long long
math in the subtraction.
//unsigned long long accum_ticks = 0;
//...
//accum_ticks = unsigned long long(end - beg);
long long accum_ticks = 0;
...
accum_ticks = 0LL end - beg;
To cope with clock_t
sometimes wrapping around, we need to determine a CLOCK_MAX
that works for a signed or unsigned clock_t
. Note that clock_t
may be a FP and the below approach is problematic.
#define CLOCK_MAX _Generic(((clock_t) 0), \
unsigned long: ULONG_MAX/2, \
long: LONG_MAX, \
unsigned: UINT_MAX/2, \
int: INT_MAX, \
unsigned short: USHRT_MAX/2, \
short: SHRT_MAX \
)
long long accum_ticks = 0;
...
long long diff = 0LL end - beg;
if (diff < 0) {
diff = 1LL CLOCK_MAX CLOCK_MAX;
}
accum_ticks = diff;
This works if the interval between calls is less than or equal to 1 "wrap".