Standard math functions reproducibility on different CPU's-CodePudding

I am working on project with a lot math calculations. After switching on a new test machine, I have noticed that a lot of tests failed. But also important to notice that tests also failed on my develop machine, and on some machines of other developers. After tracing values and comparing with values from the old machine I found that some functions (At this moment I found only cosine) from math.h sometimes returns slightly different values (for example: 40965.8966304650828827e-01 and 40965.8966304650828816e-01, -3.3088623618085204e-08 and -3.3088623618085197e-08).

New CPU: Intel Xeon Gold 6230R (Intel64 Family 6 Model 85 Stepping 7)

Old CPU: Exact model is unknown (Intel64 Family 6 Model 42 Stepping 7)

My CPU: Intel Core i7-4790K

Tests results doesn't depend on Windows version (7 and 10 were tested).

I have tried to test with binary that was statically linked with standard library to exclude loading of different libraries for different processes and Windows versions, but all results were the same.

Project compiled with /fp:precise, switching to /fp:strict changed nothing.

MSVC from Visual Studio 15 is used: 19.00.24215.1 for x64.

How to make calculations fully reproducible?

CodePudding user response：

Since you are on Windows, I am pretty sure the different results are because the UCRT detects during runtime whether FMA3 (fused-multiply-add) instructions are available for the CPU and if yes, use them in transcendental functions such as cosine. This gives slightly different results. The solution is to place the call set_FMA3_enable(0); at the very start of your main() or WinMain() function, as described here.

If you want to have reproducibility also between different operating systems, things become harder or even impossible. See e.g. this blog post.

In response also to the comments stating that you should just use some tolerance, I do not agree with this as a general statement. Certainly, there are many applications where this is the way to go. But I do think that it can be a sensible requirement to get exactly the same floating point results for some applications, at least when staying on the same OS (Windows, in this case). In fact, we had the very same issue with set_FMA3_enable a while ago. I am a software developer for a traffic simulation, and minor differences such as 10^-16 often build up and lead to entirely different simulation results eventually. Naturally, one is supposed to run many simulations with different seeds and average over all of them, making the different behavior irrelevant for the final result. But: Sometimes customers have a problem at a specific simulation second for a specific seed (e.g. an application crash or incorrect behavior of an entity), and not being able to reproduce it on our developer machines due to a different CPU makes it much harder to diagnose and fix the issue. Moreover, if the test system consists of a mixture of older and newer CPUs and test cases are not bound to specific resources, means that sometimes tests can deviate seemingly without reason (flaky tests). This is certainly not desired. Requiring exact reproducibility also makes writing the tests much easier because you do not require heuristic thresholds (e.g. a tolerance or some guessed value for the amount of samples). Moreover, our customers expect the results to remain stable for a specific version of the program since they calibrated (more or less...) their traffic networks to real data. This is somewhat questionable, since (again) one should actually look at averages, but the naive expectation in reality usually wins.

CodePudding user response：

IEEE-745 double precision binary floating point provides no more than 15 decimal significant digits of precision. You are looking at the "noise" of different library implementations and possibly different FPU implementations.

How to make calculations fully reproducible?

That is an X-Y problem. The answer is you can't. But it is the wrong question. You would do better to ask how you can implement valid and robust tests that are sympathetic to this well-known and unavoidable technical issue with floating-point representation. Without providing the test code you are trying to use, it is not possible to answer that directly.

Generally you should avoid comparing floating point values for exact equality, and rather subtract the result from the desired value, and test for some acceptable discrepancy within the supported precision of the FP type used. For example:

#define EXPECTED_RESULT  40965.8966304650
#define RESULT_PRECISION 00000.0000000001

double actual_result = test() ;
bool error = fabs( actual_result-
                   EXPECTED_RESULT ) > 
                   RESULT_PRECISION ;