Home > Software engineering >  Assembly differences between unrolled for-loops cause differing float results
Assembly differences between unrolled for-loops cause differing float results

Time:11-22

Consider the below setup:

typedef struct
{
    float d;
} InnerStruct;

typedef struct
{
    InnerStruct **c;
} OuterStruct;


float TestFunc(OuterStruct *b)
{
    float a = 0.0f;
    for (int i = 0; i < 8; i  )
        a  = b->c[i]->d;
    return a;
}

The for loop in TestFunc exactly replicates one in another function that I'm testing. Both loops are unrolled by gcc (4.9.2) but yield slightly different assembly after doing so.

Assembly for my test loop:ㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤAssembly for the original loop:

lwz       r9,-0x725C(r13)                   lwz       r9,0x4(r3)    
lwz       r8,0x4(r9)                        lwz       r8,0x8(r9)    
lwz       r10,0x0(r9)                       lwz       r10,0x4(r9)   
lwz       r11,0x8(r9)                       lwz       r11,0x0C(r9)  
lwz       r4,0x4(r8)                        lwz       r3,0x4(r8)    
lwz       r10,0x4(r10)                      lwz       r10,0x4(r10)  
lwz       r8,0x4(r11)                       lwz       r0,0x4(r11)   
lwz       r11,0x0C(r9)                      lwz       r11,0x10(r9)  
efsadd    r4,r4,r10                         efsadd    r3,r3,r10
lwz       r10,0x10(r9)                      lwz       r8,0x14(r9)   
lwz       r7,0x4(r11)                       lwz       r10,0x4(r11)  
lwz       r11,0x14(r9)                      lwz       r11,0x18(r9)  
efsadd    r4,r4,r8                          efsadd    r3,r3,r0
lwz       r8,0x4(r10)                       lwz       r0,0x4(r8)    
lwz       r10,0x4(r11)                      lwz       r8,0x0(r9)    
lwz       r11,0x18(r9)                      lwz       r11,0x4(r11)  
efsadd    r4,r4,r7                          efsadd    r3,r3,r10
lwz       r9,0x1C(r9)                       lwz       r10,0x1C(r9)  
lwz       r11,0x4(r11)                      lwz       r9,0x4(r8)    
lwz       r9,0x4(r9)                        efsadd    r3,r3,r0
efsadd    r4,r4,r8                          lwz       r0,0x4(r10)   
efsadd    r4,r4,r10                         efsadd    r3,r3,r11
efsadd    r4,r4,r11                         efsadd    r3,r3,r9
efsadd    r4,r4,r9                          efsadd    r3,r3,r0

The issue is the float values these instructions return are not exactly the same. And I can't change the original loop. I need to modify the test loop somehow to return the same values. I believe the test's assembly is equivalent to just adding each element one after another. I'm not very familiar with assembly so I wasn't sure how the above differences translated into c. I know this is the issue because if I add a print to the loops, they don't unroll and the results match exactly as expected.

CodePudding user response:

I presume this is for unit-testing the one function with another.

In general floating point calculations are never exact in C or C and it is not usually considered legitimate to expect them to be.

The Java language standard requires exact floating point results. Doing this is a constant source of hatred against Java, with various accusations that making the results reproducible usually makes them less accurate and sometimes makes the code much slower too.

If you are doing your testing in C or C then I would suggest this approach:

Calculate the result as best you can, with both high precision and high accuracy. In this case the input data are in 32-bit float, so convert them all to 64-bit float before calculating the expected result.

If the inputs were in double (and you don't have a bigger long double type) then sort the values into order and add them up smallest to largest. This will result in the least loss of accuracy.

Once you have your expected result then test that the function output matches it within some bounds.

There are two approaches to setting what accuracy you require to consider the test as a pass:

One approach is to check what the real physical meaning of the number is and what accuracy you actually require.

The other approach is to just require that the result is accurate to within a few least-significant-bits of the ideal result, ie: that the error is less than a few times the ideal result times FLT_EPSILON.

CodePudding user response:

Disabling fast-math seems to fix this issue. Thanks to @njuffa for the suggestion. I was hoping to be able to design the test function around this optimization, but it doesn't seem to be possible. At least I know what the issue is now. Appreciate everyone's help on the problem!

  • Related