Home > Back-end >  How to express float constants precisely in source code
How to express float constants precisely in source code

Time:08-16

I have some C 11 code generated via a code generator that contains a large array of floats, and I want to make sure that the compiled values are precisely the same as the compiled values in the generator (assuming that both depend on the same float ISO norm)

So I figured the best way to do it is to store the values as hex representations and interpret them as float in the code.

Edit for Clarification: The code generator takes the float values and converts them to their corresponding hex representations. The target code is supposed to convert back to float.

It looks something like this:

const unsigned int data[3] = { 0x3d13f407U, 0x3ea27884U, 0xbe072dddU};
float const* ptr = reinterpret_cast<float const*>(&data[0]);

This works and gives me access to all the data element as floats, but I recently stumbled upon the fact that this is actually undefined behavior and only works because my compiler resolves it the way I intended:

https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8

https://en.cppreference.com/w/cpp/language/reinterpret_cast.

The standard basically says that reinterpret_cast is not defined between POD pointers of different type.

So basically I have three options:

  1. Use memcopy and hope that the compiler will be able to optimize this

  2. Store the data not as hex-values but in a different way.

  3. Use std::bit_cast from C 20.

I cannot use 3) because I'm stuck with C 11.

I don't have the resources to store the data array twice, so I would have to rely on the compiler to optimize this. Due to this, I don't particularly like 1) because it could stop working if I changed compilers or compiler settings.

So that leaves me with 2):

Is there a standardized way to express float values in source code so that they map to the exact float value when compiled? Does the ISO float standard define this in a way that guarantees that any compiler will follow the interpretation? I imagine if I deviate from the way the compiler expects, I could run the risk that the float "neighbor" of the number I actually want is used.

I would also take alternative ideas if there is an option 4 I forgot.

CodePudding user response:

How to express float constants precisely in source code

Use hexadecimal floating point literals. Assuming some endianess for the hexes you presented:

float floats[] = { 0x1.27e80ep-5, 0x1.44f108p-2, -0x1.0e5bbap-3 };

CodePudding user response:

If you have the generated code produce the full representation of the floating-point value—all of the decimal digits needed to show its exact value—then a C 11 compiler is required to parse the number exactly.

C 11 draft N3092 2.14.4 1 says, of a floating literal:

… The exponent, if present, indicates the power of 10 by which the significant [likely typo, should be “significand”] part is to be scaled. If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner…

Thus, if the floating literal does not have all the digits needed to show the exact value, the implementation may round it either upward or downward, as the implementation defines. But if it does have all the digits, then the value represented by the floating literal is representable in the floating-point format, and so its value must be the result of the parsing.

  • Related