Home > database >  Why is time passed being incorrect calculated across processes?
Why is time passed being incorrect calculated across processes?

Time:12-03

I am writing a program that sends a signal in one process and receives it in a thread in another. I have the entire program written with signals being caught and handled, as well as any synchronization issues. The problem is, I am trying to log the time the signal was sent and the time the signal was received. Though the values across the process vary strangely.

Here is how I did it.

I have a header file header.h which includes a shared global extern struct timespec begin, end;. The reason I made these shared was that I would need the beginning time to calculate the time elapsed since the program began.

Here is how I calculate the time elapsed.

I am using the POSIX clock_gettime(). I start the program and begin the timer, then when a signal is sent I run:

clock_gettime(CLOCK_REALTIME, &end);
long seconds = end.tv_sec - begin.tv_sec;
long nanoseconds = end.tv_nsec - begin.tv_nsec;
double elapsed = seconds   nanoseconds * 1e-9;

This all occurs in the main program. The second process is another program which is exec() in a child process and that is where the signal catch occurs. When I catch the signal, I store some data about it in a struct and store it in a buffer for another thread to read and log from.

typedef struct
{
    int sig;
    double time;
    long int tid;
} data;

Here's what I do in one of the threads:

data d;
d.sig = 2;
d.tid = pthread_self();
clock_gettime(CLOCK_REALTIME, &end);
long seconds = end.tv_sec - begin.tv_sec;
long nanoseconds = end.tv_nsec - begin.tv_nsec;
double elapsed = seconds   nanoseconds * 1e-9;
d.time = elapsed;
put(d);

The problem is my outputs are vastly different. In my sentlog.txt the time is represented correctly, with enough precision to see a difference.

SIGUSR2 sent at 1.000286 seconds
SIGUSR2 sent at 1.082671 seconds
SIGUSR2 sent at 1.155440 seconds
SIGUSR1 sent at 1.250770 seconds
SIGUSR1 sent at 1.314637 seconds
SIGUSR2 sent at 1.398995 seconds
SIGUSR1 sent at 1.460559 seconds
SIGUSR2 sent at 1.498223 seconds
SIGUSR2 sent at 1.577555 seconds
SIGUSR1 sent at 1.618036 seconds
SIGUSR2 sent at 1.684488 seconds
SIGUSR2 sent at 1.743165 seconds
SIGUSR2 sent at 1.780100 seconds
SIGUSR2 sent at 1.871603 seconds
SIGUSR1 sent at 1.901293 seconds
SIGUSR2 sent at 1.944139 seconds
SIGUSR1 sent at 1.984142 seconds
SIGUSR1 sent at 2.040130 seconds

While the receivelog.txt is not.

Here is how I log to the file and stdout

if (d.sig == 1)
        {
            printf("SIGUSR1 received by thread %ld at time %f\n", d.tid, d.time);
            fflush(stdout);
            fprintf(fpRecieve, "Thread %ld received SIGUSR1 at %f seconds\n", d.tid, d.time);
            fflush(fpRecieve);
        }
        else if (d.sig == 2)
        {
            printf("SIGUSR2 received by thread %ld at time %f\n", d.tid, d.time);
            fflush(stdout);
            fprintf(fpRecieve, "Thread %ld received SIGUSR2 at %f seconds\n", d.tid, d.time);
            fflush(fpRecieve);
        }
Thread 139995363964672 received SIGUSR2 at 1670008328.531628 seconds
Thread 139995363964672 received SIGUSR2 at 1670008328.613999 seconds
Thread 139995363964672 received SIGUSR2 at 1670008328.686767 seconds
Thread 139995372357376 received SIGUSR1 at 1670008328.782099 seconds
Thread 139995372357376 received SIGUSR1 at 1670008328.845975 seconds
Thread 139995363964672 received SIGUSR2 at 1670008328.930328 seconds
Thread 139995372357376 received SIGUSR1 at 1670008328.991889 seconds
Thread 139995363964672 received SIGUSR2 at 1670008329.029554 seconds
Thread 139995363964672 received SIGUSR2 at 1670008329.108883 seconds
Thread 139995372357376 received SIGUSR1 at 1670008329.149364 seconds
Thread 139995363964672 received SIGUSR2 at 1670008329.215814 seconds
Thread 139995363964672 received SIGUSR2 at 1670008329.274493 seconds
Thread 139995363964672 received SIGUSR2 at 1670008329.311425 seconds
Thread 139995363964672 received SIGUSR2 at 1670008329.402932 seconds
Thread 139995372357376 received SIGUSR1 at 1670008329.432621 seconds
Thread 139995363964672 received SIGUSR2 at 1670008329.475466 seconds

Why can I not simply just use the same operation as before?

CodePudding user response:

There are several issues with the way you are measuring the elapsed time in your code.

  1. you are using the CLOCK_REALTIME clock to measure the elapsed time, but this clock is not guaranteed to have a constant rate or be monotonic. This means that the value returned by clock_gettime() using this clock may not accurately reflect the actual elapsed time, and it may even go backwards if the system time is adjusted. This is likely the cause of the large and incorrect values you are seeing in your receivelog.txt file.

  2. you are subtracting the tv_sec and tv_nsec fields of the struct timespec values directly to calculate the elapsed time. This is not always correct, because the tv_nsec field may be greater than or equal to one billion (1000000000), in which case you will not get the correct elapsed time. Instead, you should use the timespec_get() function to calculate the elapsed time in seconds, which will handle this automatically for you.

  3. you are not properly accounting for the fact that the end time may be earlier than the begin time. This can happen if the system time is adjusted, or if the end time is measured on a different CPU than the begin time. In this case, you will get a negative elapsed time, which is not correct. You should check for this condition and handle it properly.

Here is how you can fix these issues and properly measure the elapsed time in your code:

#include <time.h>
#include <stdio.h>

// Use the CLOCK_MONOTONIC clock instead of CLOCK_REALTIME
#define CLOCK CLOCK_MONOTONIC

struct timespec begin, end;

int main()
{
  // Start the timer
  clock_gettime(CLOCK, &begin);

  // Do some work

  // Measure the elapsed time
  clock_gettime(CLOCK, &end);

  // Use timespec_get() to calculate the elapsed time in seconds
  double elapsed = end.tv_sec - begin.tv_sec  
    (end.tv_nsec - begin.tv_nsec) * 1e-9;

  // Check for negative elapsed time and handle it properly
  if (elapsed < 0)
    elapsed = 0;

  // Print the elapsed time
  printf("Elapsed time: %f seconds\n", elapsed);

  return 0;
}

This code uses the CLOCK_MONOTONIC clock, which is guaranteed to be monotonic and have a constant rate. It also uses the timespec_get() function to calculate the elapsed time in seconds, and it checks for and handles negative elapsed time properly. This should give you accurate and correct elapsed time measurements.

CodePudding user response:

I have a header file header.h which includes a shared global extern struct timespec begin, end;. The reason I made these shared was that I would need the beginning time to calculate the time elapsed since the program began.

The end does not need to be global (and should not be). Only begin needs to be global.

When end is global, multiple threads can access it at the same time. This is a race condition and is UB (undefined behavior).

Make end a function scoped variable.


You're not showing the code for put or the [ring?] queue definition.

Access to it should be with a mutex or stdatomic.h primitives.

Although a bit trickier to implement, I usually prefer the atomic functions.


Also, I agree that the code should be using CLOCK_MONOTONIC.

  • Related