Improving the performance of nested loops in C-CodePudding

Given a list of spheres described by (x_i, y_i, r_i), meaning the center of sphere i is at the point (x_i, y_i, 0) in three-dimensional space and its radius is r_i, I want to compute all z_i where z_i = max { z | (x_i, y_i, z) is a point on any sphere }. In other words, z_i is the highest point over the center of sphere i that is in any of the spheres.

I have two arrays

int **vs = (int **)malloc(num * sizeof(int *));
double **vh = (double **)malloc(num * sizeof(double *));
for (int i = 0; i < num; i  ){
    vs[i] = (int *)malloc(2 * sizeof(int)); // x,y
    vh[i] = (double *)malloc(2 * sizeof(double)); r,z
}

The objective is to calculate the maximum z for each point. Thus, we should check if there are larger spheres over each x,y point. Initially we see vh[i][1]=vh[i][0] for all points, which means that z is the r of each sphere. Then, we check if these z values are inside larger spheres to maximize the z value.

for (int i = 0; i < v; i  ) {
    double a = vh[i][0] * vh[i][0]; // power of the radius of sphere #1

    for (int j = 0; j < v; j  ) {
        if (vh[i][0] > vh[j][1]) { // check only if r of sphere #1 is larger than the current z of #2

            double b = a - (vs[j][0] - vs[i][0]) * (vs[j][0] - vs[i][0])
                         - (vs[j][1] - vs[i][1]) * (vs[j][1] - vs[i][1]);
            // calculating the maximum z value of sphere #2 crossing sphere #1 
            // (r of sphere #1)**2 = (z of x_j,y_j)**2   (distance of two centers)**2

            if (b > vh[j][1] * vh[j][1]) {
                vh[j][1] = sqrt(b);// update the z value if it is larger than the current value
            }
        }
    }
}

it works perfectly, but the nested loop is very slow when the number of points increases. I look for a way to speed up the process.

An illustration for the clarification of the task

CodePudding user response：

When you say

The objective is to calculate the maximum z for each point.

I take you to mean, for the center C of each sphere, the maximum z coordinate among all the points lying directly above C (along the z axis) on any of the spheres. This is fundamentally an O(n²) problem -- there is nothing you can do to prevent the computational expense scaling with the square of the number of spheres.

But there may be some things you can do to reduce the scaling coeffcient. Here are some possibilities:

Use bona fide 2D arrays (== arrays of arrays) instead arrays of pointers. It's easier to implement, more memory-efficient, and better for locality of reference:
```
int (*vs)[2] = malloc(num * sizeof(*vs));
double (*vh)[2] = malloc(num * sizeof(*h));
// no other allocations needed
```
Alternatively, it may help to use an array of structures, one per sphere, instead of two 2D arrays of numbers. It would certainly make your code clearer, but it might also help give a slight speed boost by improving locality of reference:
```
struct sphere {
    int x, y;
    double r, z;
};
struct sphere *spheres = malloc(num * sizeof(*spheres));
```
Store z² instead of z, at least for the duration of the computation. This will reduce the number of somewhat-expensive sqrt calls from O(v²) to O(v), supposing you make a single pass at the end to convert all the results to zs, and it will save you O(v²) multiplications, too. (More if you could get away without ever converting from z² to z.)
Pre-initialize each vh[i][1] value to the radius of sphere i (or the square of the radius if you are exercising the previous option, too), and add j != i to the condition around the inner-loop body.
Sorting the spheres in decreasing order by radius may help you find larger provisional z values earlier, and therefore to make the radius test in the inner loop more effective at culling unnecessary computations.
You might get some improvement by checking each distinct pair only once. That is, for each unordered pair i, j, you can compute the inter-center distance once only, determine from the relative radii which height to check for a possible update, and go from there. The extra logic involved might or might not pay off through a reduction in other computations.

Additionally, if you are doing this for large enough inputs, then you might be able to reduce the wall time consumed by parallelizing the computation.

Note, by the way, that this comment is incorrect:

            // (r of sphere #1)**2 = (r of sphere #2)**2   (distance of two centers)**2

. However, it also not what you are relying upon. What you are relying upon is that if sphere 1 covers the center of sphere 2 at all, then its height, z, above the center of sphere 2 satisfies the relationship

r₁² = z² d_1,2²

. That is, where you wrote r of sphere #2 in the comment, you appear to have meant z.