Why does this simple program compiled with gcc,-mfpmath=387, and an optimization level of -O2 or -O3-CodePudding

I have a short program that performs a numerical computation, and obtains an incorrect NaN result when some specific conditions hold. I cannot see how this NaN result can arise. Note that I am not using compiler options that allow the reordering of arithmetic operations, such as -ffath-math.

Question: I am looking for an explanation of how the NaN result arises. Mathematically, there is nothing in the computation that leads to division by zero or similar. Am I missing something obvious?

Note that I am not asking how to fix the problem—that is easy. I am simply looking for an understanding of how the NaN appears.

Minimal example

Note that this example is very fragile and even minor modifications, such as adding printf() calls in the loop to observe values, will change the behaviour. This is why I was unable to minimize it further.

// prog.c

#include <stdio.h>
#include <math.h>

typedef long long myint;

void fun(const myint n, double *result) {
    double z = -1.0;
    double phi = 0.0;
    for (myint i = 0; i < n; i  ) {
        double r = sqrt(1 - z*z);

        /* avoids division by zero when r == 0 */
        if (i != 0 && i != n-1) {
            phi  = 1.0 / r;
        }

        double x = r*cos(phi);
        double y = r*sin(phi);

        result[i   n*0] = x;
        result[i   n*1] = y;
        result[i   n*2] = z;

        z  = 2.0 / (n - 1);
    }
}

#define N 11

int main(void) {
    // perform computation
    double res[3*N];
    fun(N, res);

    // output result
    for (int i=0; i < N; i  ) {
        printf("%g %g %g\n", res[i N*0], res[i N*1], res[i N*2]);
    }

    return 0;
}

Compile with:

gcc -O3 -mfpmath=387 prog.c -o prog -lm

The last line of the output is:

nan nan 1

Instead of NaN, I expect a number close to zero.

Critical features of the example

The following must all hold for the NaN output to appear:

Compile with GCC on an x86 platform. I was able to reproduce with this GCC 12.2.0 (from MacPorts) on macOS 10.14.6, as well as with GCC versions 9.3.0, 8.3.0 and 7.5.0 on Linux (openSUSE Leap 15.3).

I cannot reproduce it with GCC 10.2.0 or later on Linux, or GCC 11.3.0 on macOS.
Choose to use x87 instructions with -mfpmath=387, and an optimization level of -O2 or -O3.
myint must be a signed 64-bit type.
Thinking of result as an n-by-3 matrix, it must be stored in column-major order.
No printf() calls in the main loop of fun().

Without these features, I do get the expected output, i.e. something like 1.77993e-08 -1.12816e-08 1 or 0 0 1 as the last line.

Explanation of the program

Even though it doesn't really matter to the question, I give a short explanation of what the program does, to make it easier to follow. It computes x, y, z three-dimensional coordinates of n points on the surface of a sphere in a specific arrangement. z values go from -1 to 1 in equal increments, however, the last value won't be precisely 1 due to numerical round-off errors. The coordinates are written into an n-by-3 matrix, result, stored in column-major order. r and phi are polar coordinates in the (x, y) plane.

Note that when z is -1 or 1 then r becomes 0. This happens in the first and last iteration steps. This would lead to division by 0 in the 1.0 / r expression. However, 1.0 / r is excluded from the first and last iteration of the loop.

CodePudding user response：

This is caused by interplay of x87 80-bit internal precision, non-conforming behavior of GCC, and optimization decisions differing between compiler versions.

x87 supports IEEE binary32 and binary64 only as storage formats, converting to/from its 80-bit representation on loads/stores. To make program behavior predictable, the C standard requires that extra precision is dropped on assignments, and allows to check intermediate precision via the FLT_EVAL_METHOD macro. With -mfpmath=387, FLT_EVAL_METHOD is 2, so you know that intermediate precision corresponds to the long double type.

Unfortunately, GCC does not drop extra precision on assignments, unless you're requesting stricter conformance via -std=cNN (as opposed to -std=gnuNN), or explicitly passing -fexcess-precision=standard.

In your program, the z = 2.0 / (n - 1); statement should be computed by:

Computing 2.0 / (n - 1) in the intermediate 80-bit precision.
Adding to previous value of z (still in the 80-bit precision).
Rounding to the declared type of z (i.e. to binary64).

In the version that ends up with NaNs, GCC instead does the following:

Computes 2.0 / (n - 1) just once before the loop.
Rounds this fraction from binary80 to binary64 and stores on stack.
In the loop, it reloads this value from stack and adds to z.

This is non-conforming, because the 2.0 / (n - 1) undergoes rounding twice (first to binary80, then to binary64).

The above explains why you saw different results depending on compiler version and optimization level. However, in general you cannot expect your computation to not produce NaNs in the last iteration. When n - 1 is not a power of two, 2.0 / (n - 1) is not representable exactly and may be rounded up. In that case, 'z' may be growing a bit faster than the true sum -1.0 2.0 / (n - 1) * i, and may end up above 1.0 for i == n - 1, causing sqrt(1 - z*z) to produce a NaN due to a negative argument.

In fact, if you change #define N 11 to #define N 12 in your program, you will deterministically get a NaN both with 80-bit and 64-bit intermediate precision.

CodePudding user response：

... how the NaN result arises (?)

Even though better adherence to the C spec may apparently solve OP's immediate problem, I assert other prevention practices should be considered.

sqrt(1 - z*z) is a candidate NaN when |z| > 1.0.

The index test prevention of division by zero may not be enough and then leading to cos(INFINITE), another NaN possibility.

// /* avoids division by zero when r == 0 */
//    if (i != 0 && i != n-1) {
//        phi  = 1.0 / r;
//    }

To avoid these, 1) test directly and 2) use more a more precise approach.

if (r) {
  phi  = 1.0 / r;
}

// double r = sqrt(1 - z*z);
double rr = (1-z)*(1 z);  // More precise than 1 - z*z
double r = rr < 0.0 ? 0.0 : sqrt(rr);