Why is the euclidean distance function slower in c than in java?-CodePudding

I implemented and bench marked the following functions in c and java using the following code. For c, I'm getting about 1.688852 seconds while for java, it only takes like 0.355038 seconds. Even if I remove the sqrt function, inline the code manually or change function signature to accept 6 double coordinates (to avoid accessing via pointers) for c time lapsed doesn't improve much.

I'm compiling the c program like cc -O2 main.c -lm. For java, I'm running the application in intellij idea with the default jvm options (java 8, openjdk).

#include <math.h>
#include <stdio.h>
#include <time.h>

typedef struct point3d
{
  double x;
  double y;
  double z;
} point3d_t;

double distance(point3d_t *from, point3d_t *to);

int main(int argc, char const *argv[])
{
  point3d_t from = {.x = 2.3, .y = 3.45, .z = 4.56};
  point3d_t to = {.x = 5.678, .y = 3.45, .z = -9.0781};

  double time = 0.0;
  int count = 10000000;

  for (size_t i = 0; i < count; i  )
  {
    clock_t tic = clock();
    double d = distance(&from, &to);
    clock_t toc = clock();
    time  = ((double) (toc - tic) / CLOCKS_PER_SEC);
  }

  printf("Elapsed: %f seconds\n", time);

  return 0;
}

double distance(point3d_t *from, point3d_t *to)
{
  double dx = to->x - from->x;
  double dy = to->y - from->y;
  double dz = to->z - from->z;

  double d2 = (dx * dx)   (dy * dy)   (dz   dz);
  return sqrt(d2);
}

java:

public class App 
{
    static Random rnd = new Random();

    public static void main( String[] args )
    {
        var sw = new StopWatch();
        var time = 0.0;
        var count = 10000000;

        for (int i = 0; i < count; i  ) {
            var from = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());
            var to = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());

            sw.start();
            var dist = distance(from, to);
            sw.stop();
            time  = sw.getTime(TimeUnit.NANOSECONDS);
            sw.reset();
        }

        System.out.printf("Time: %f seconds\n", time / 1e09);
    }

    public static double distance(Vector3D from, Vector3D to) {
        var dx = to.getX() - from.getX();
        var dy = to.getY() - from.getY();
        var dz = to.getZ() - from.getZ();

        return Math.sqrt((dx * dx)   (dy * dy)   (dz * dz));
    }
}

My objective is to understand why the c program is slower and get it to work faster than the java program.

EDIT: I'm using random values in the java program to try and make sure jvm doesn't do anything funny like caching the result and sidestep computations altogether.

EDIT: Updating the two snippets for c to use clock_gettime(), to clock the time taken for all the loops rather than the method call and also to not toss the result from the method call:

#include <math.h>
#include <stdio.h>
#include <time.h>

typedef struct point3d
{
  double x;
  double y;
  double z;
} point3d_t;

double distance(point3d_t *from, point3d_t *to);

int main(int argc, char const *argv[])
{
  point3d_t from = {.x = 2.3, .y = 3.45, .z = 4.56};
  point3d_t to = {.x = 5.678, .y = 3.45, .z = -9.0781};

  struct timespec fs;
  struct timespec ts;

  long time = 0;
  int count = 10000000;
  double dist = 0;

  clock_gettime(CLOCK_REALTIME, &fs);

  for (size_t i = 0; i < count; i  )
  {
    dist = distance(&from, &to);
  }

  clock_gettime(CLOCK_REALTIME, &ts);
  time = ts.tv_nsec - fs.tv_nsec;

  if (dist == 0.001)
  {
    printf("hello\n");
  }

  printf("Elapsed: %f sec\n", (double) time / 1e9);

  return 0;
}

double distance(point3d_t *from, point3d_t *to)
{
  double dx = to->x - from->x;
  double dy = to->y - from->y;
  double dz = to->z - from->z;

  double d2 = (dx * dx)   (dy * dy)   (dz   dz);
  return sqrt(d2);
}

java:

public class App 
{
    static Random rnd = new Random();

    public static void main( String[] args )
    {
        var from = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());
        var to = Vector3D.of(rnd.nextDouble(), rnd.nextDouble(), rnd.nextDouble());

        var time = 0.0;
        var count = 10000000;
        double dist = 0.0;

        var start = System.nanoTime();
        for (int i = 0; i < count; i  ) {
            dist = distance(from, to);
        }

        var end = System.nanoTime();
        time = end - start;

        if (dist == rnd.nextDouble()) {
            System.out.printf("hello! %f\n", dist);
        }

        dist = dist   1;
        System.out.printf("Time: %f sec\n", (double) time / 1e9);
        System.out.printf("Yohoo! %f\n", dist);
    }

    public static double distance(Vector3D from, Vector3D to) {
        var dx= to.getX() - from.getX();
        var dy = to.getY() - from.getY();
        var dz = to.getZ() - from.getZ();

        return Math.sqrt((dx * dx)   (dy * dy)   (dz * dz));
    }
}

Compiling c code using gcc -Wall -std=gnu99 -O2 main.c -lm. The results now are 0.06323 seconds for c code and 0.006325 seconds for java.

EDIT: As Jérôme Richard and Peter Cordes pointed out my bench marking is just flawed, not to mention I was taking the sqrt of a negative number in the c version. So, as soon as I compiled the c program with -fno-math-errno, it clocked 0 seconds. I compiled c program like gcc -O2 -std=gnu99 main.c -lm. Now the c program is clocking effectively zero seconds (271 ns) while java clocking 0.007217 seconds. Everything's in order :)

Below is the final code:

#include <math.h>
#include <stdio.h>
#include <time.h>

typedef struct point3d
{
  double x;
  double y;
  double z;
} point3d_t;

double distance(point3d_t *from, point3d_t *to);

int main(int argc, char const *argv[])
{
  point3d_t from = {.x = 2.3, .y = 3.45, .z = 4.56};
  point3d_t to = {.x = 5.678, .y = 3.45, .z = -9.0781};

  struct timespec fs;
  struct timespec ts;

  long time = 0;
  int count = 10000000;
  double dist = 0;

  clock_gettime(CLOCK_REALTIME, &fs);

  for (size_t i = 0; i < count; i  )
  {
    dist = distance(&from, &to);
  }

  clock_gettime(CLOCK_REALTIME, &ts);
  time = ts.tv_nsec - fs.tv_nsec;

  printf("hello %f \n", dist);
  printf("Elapsed: %f ns\n", (double) time);
  printf("Elapsed: %f sec\n", (double) time / 1e9);

  return 0;
}

double distance(point3d_t *from, point3d_t *to)
{
  double dx = (to->x) - (from->x);
  double dy = (to->y) - (from->y);
  double dz = (to->z) - (from->z);

  double d2 = (dx * dx)   (dy * dy)   (dz * dz);
  return sqrt(d2);
}

java:

public class App 
{
    static Random rnd = new Random();

    public static void main( String[] args )
    {
        var from = Vector3D.of(2.3, 3.45, 4.56);
        var to = Vector3D.of(5.678, 3.45, -9.0781);

        var time = 0.0;
        var count = 10000000;
        double dist = 0.0;

        var start = System.nanoTime();
        for (int i = 0; i < count; i  ) {
            dist = distance(from, to);
        }

        var end = System.nanoTime();
        time = end - start;

        System.out.printf("Yohoo! %f\n", dist);
        System.out.printf("Time: %f ns\n", (double) time / 1e9);
    }

    public static double distance(Vector3D from, Vector3D to) {
        var dx = to.getX() - from.getX();
        var dy = to.getY() - from.getY();
        var dz = to.getZ() - from.getZ();

        var d2 =  (dx * dx)   (dy * dy)   (dz * dz);
        return Math.sqrt(d2);
    }
}

CodePudding user response：

First of all, the method used to measure the timings is very imprecise. The current methods introduce a huge bias that is probably bigger than the measure itself. Indeed, clock is not very precise on many platforms (about 1 ms on my machine and often not better than 1 us on almost all platforms). Moreover, the imprecision is strongly amplified by the 10,000,000 iterations. If you want to measure the loop precisely, you need to move the clock call outside the loop (and if possible use a more accurate measurement function).

Still, the main problem is that the function result is unused and the Java JIT can see that and partially optimize that. GCC cannot because of the math function standard behaviour (errno cause a side effect not available in the Java code). You can disable the use of errno using the command-line flag -fno-math-errno. With that GCC can now totally optimize the function (ie. remove the function call) and the resulting time is much smaller. However, the benchmark is flawed and you probably do not want to measure that. If you want to write a correct benchmark, you need to read the computed value. For example, you can compute a checksum, at least to check the results are correct/equivalent.