Home > Mobile >  Float Arithmetic inconsistent between golang programs
Float Arithmetic inconsistent between golang programs

Time:10-13

When decoding audio files with pion/opus I will occasionally get values that are incorrect.

I have debugged it down to the following code. When this routine runs inside the Opus decoder I get a different value then when I run it outside? When the two floats are added together the right most bit is different. The difference in values eventually becomes a problem as the program runs longer.

Is this a bug or expected behavior? I don't know how to debug this deeper/dump state of my program to understand more.

Outside decoder

package main

import (
    "fmt"
    "math"
)

func main() {
    a := math.Float32frombits(uint32(955684399))
    b := math.Float32frombits(uint32(927295728))

    fmt.Printf("%b\n", math.Float32bits(a))
    fmt.Printf("%b\n", math.Float32bits(b))
    fmt.Printf("%b\n", math.Float32bits(a b))
}

Returns

111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100110

Then Inside decoder

    fmt.Printf("%b\n", math.Float32bits(lpcVal))
    fmt.Printf("%b\n", math.Float32bits(val))
    fmt.Printf("%b\n", math.Float32bits(lpcVal val))

Returns

111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100111

CodePudding user response:

This was happening because of Fused multiply and add. Multiple floating point operations were becoming combined into one operation.

You can read more about it in the Go Language Spec#Floating_Point_Operators

The change I made to my code was

 - lpcVal  = currentLPCVal * (aQ12 / 4096.0)
   lpcVal = float32(lpcVal)   float32(currentLPCVal)*float32(aQ12)/float32(4096.0)

Thank you to Bryan C. Mills for answering this on the #performance channel on the Gophers slack.

CodePudding user response:

I guess that lpcval and val are not Float32 but rather Float64.

If that is the case, then you are proposing two different operations:

  • in the former case, you do Float32bits(lpcval) Float32bits(val)
  • in the later case, you do Float32bits(lpcval val)

the two 32 bits floats are in binary:

1.11101101001011000101111 * 2^-14
1.10001010110100011110000 * 2^-17

The exact sum is

1.000011110100001101001101 * 2^-13

which is an exact tie between two representable Float32
the result is rounded to the Float32 with even significand

1.00001111010000110100110 * 2^-13

But lpcval and val are Float64: instead of 23 bits after the floating point, they have 52 (19 more).

If a single bit among those 19 more bits is different from zero, the result might not be an exact tie, but slightly larger than the exact tie.
Once converted to nearest Float32, that will be

1.00001111010000110100111 * 2^-13

Since we have no idea of what lpcval and val contains in those low significant bits, anything can happen, even without the use of fma operations.

  • Related