Home > Software design >  Misunderstanding Go Language specification on floating-point rounding
Misunderstanding Go Language specification on floating-point rounding

Time:04-17

The Go language specification on the section about Constant expressions states: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.


Does the sentence This rounding may cause a floating-point constant expression to be invalid in an integer context point to something like the following:

func main() {
    a := 853784574674.23846278367
    fmt.Println(int8(a)) // output: 0
}

CodePudding user response:

An int8 is a signed integer, and can have a value from -128 to 127. That's why you are seeing unexpected value with int8(a) conversion.

CodePudding user response:

The quoted part from the spec does not apply to your example, as a is not a constant expression but a variable, so int8(a) is converting a non-constant expression.

The quoted part means that while constants are represented with a lot higher precision than the builtin types (eg. float64 or int64), the precision that a compiler (have to) implement is not infinite (for practical reasons), and even if a literal is representable precisely, performing operations on them may be carried out with intermediate roundings and may not give mathematically correct result.

The spec includes the minimum supportable precision:

Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:

  • Represent integer constants with at least 256 bits.
  • Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.
  • Give an error if unable to represent an integer constant precisely.
  • Give an error if unable to represent a floating-point or complex constant due to overflow.
  • Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.

For example:

const (
    x = 1e100000   1
    y = 1e100000
)

func main() {
    fmt.Println(x - y)
}

This code should output 1 as x is being 1 larger than y. Running it on the Go Playground outputs 0 because the constant expression x - y is executed with roundings, and the 1 is lost as a result.

If we lower the constants, we get correct result:

const (
    x = 1e1000   1
    y = 1e1000
)

func main() {
    fmt.Println(x - y)
}

This outputs the mathematically correct 1 result. Try it on the Go Playground.

  •  Tags:  
  • go
  • Related