I have the hex literal 400D99999999999A which is the bitpattern for 3.7 as
a double
How do I write this in C? I saw this
page about
floating_literal and hex. Maybe it's obvious and I need to sleep but I'm
not seeing how to write the bitpattern as a float. I understand it's
suppose to let a person write a more precise fraction but I'm not sure how
to translate a bitpattern to a literal
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
double d = 0x400D99999999999Ap0;
printf("%f\n", d); //incorrect
unsigned long l = 0x400D99999999999A;
memcpy(&d, &l, 8);
printf("%f\n", d); //correct, 3.7
return 0;
}
CodePudding user response:
The value you're trying to use is an IEEE bit pattern. C doesn't support this directly. To get the desired bit pattern, you need to specify the mantissa, as an ordinary hex integer, along with a power-of-two exponent.
In this case, the desired IEEE bit pattern is 400D99999999999A
. If you strip off the sign bit and the exponent, you're left with D99999999999A
. There's an implied leading 1
bit, so to get the actual mantissa value, that needs to be explicitly added, giving 1D99999999999A
. This represents the mantissa as an integer with no fractional part. It then needs to be scaled, in this case by a power-of-two exponent value of -51. So the desired constant is:
double d = 0x1D99999999999Ap-51;
If you plug this into your code, you will get the desired bit pattern of 400D99999999999A
.
CodePudding user response:
how to write the bitpattern as a float.
The bit pattern 0x400D99999999999A commonly encodes (alternates encodings exists) the double
with a value of about 3.7*1.
double d;
unsigned long l = 0x400D99999999999A;
// Assume same size, same endian
memcpy(&d, &l, 8);
printf("%g\n", d);
// output 3.7
To write the value out using "%a"
format with a hexadecimal significand, decimal power-of-2 exponent
printf("%g\n", d);
// output 0x1.d99999999999ap 1
The double
constant (not literal) 1.d99999999999ap 1
has an explicit 1 bit and and the lower 52-bits of 0x400D99999999999A
and the exponent of 1
is like the biased exponent bits (12 most significant bits, expect the sign bit) biased by 0x400 - 1.
Now code can use double d = 0x1.d99999999999ap 1
instead of the memcpy()
to initialize d
.
*1 Closest double
to 3.7 is exactly
3.7000000000000001776356839400250464677810668945312
CodePudding user response:
The following program shows how to interpret a string of bits as a double
, using either the native double
format or using the IEEE-754 double-precision binary format (binary64).
#include <math.h>
#include <stdint.h>
#include <string.h>
// Create a mask of n bits, in the low bits.
#define Mask(n) (((uint64_t) 1 << (n)) - 1)
/* Given a uint64_t containing 64 bits, this function interprets them in the
native double format.
*/
double InterpretNativeDouble(uint64_t bits)
{
double result;
_Static_assert(sizeof result == sizeof bits, "double must be 64 bits");
// Copy the bits into a native double.
memcpy(&result, &bits, sizeof result);
return result;
}
/* Given a uint64_t containing 64 bits, this function interprets them in the
IEEE-754 double-precision binary format. (Checking that the native double
format has sufficient bounds and precision to represent the result is
omitted. For NaN results, a NaN is returned, but the signaling
characteristic and the payload bits are not supported.)
*/
double InterpretDouble(uint64_t bits)
{
/* Set some parameters of the format. (This routine is not fully
parameterized for all IEEE-754 binary formats; some hardcoded constants
are used.)
*/
static const int Emax = 1023; // Maximum exponent.
static const int Precision = 53; // Precision (number of digits).
// Separate the fields in the encoding.
int SignField = bits >> 63;
int ExponentField = bits >> 52 & Mask(11);
uint64_t SignificandField = bits & Mask(52);
// Interpret the exponent and significand fields.
int Exponent;
double Significand;
switch (ExponentField)
{
/* An exponent field of all zero bits indicates a subnormal number,
for which the exponent is fixed at its minimum and the leading bit
of the significand is zero. This includes zero, which is not
classified as a subnormal number but is consistent in the encoding.
*/
case 0:
Exponent = 1 - Emax;
Significand = 0 ldexp(SignificandField, 1-Precision);
// ldexp(x, y) computes x * pow(2, y).
break;
/* An exponent field of all one bits indicates a NaN or infinity,
according to whether the significand field is zero or not.
*/
case Mask(11):
Exponent = 0;
Significand = SignificandField ? NAN : INFINITY;
break;
/* All other exponent fields indicate normal numbers, for which the
exponent is encoded with a bias (equal to EMax) and the leading bit
of the significand is one.
*/
default:
Exponent = ExponentField - Emax;
Significand = 1 ldexp(SignificandField, 1-Precision);
break;
}
// Combine the exponent and significand,.
Significand = ldexp(Significand, Exponent);
// Interpret the sign field.
if (SignField)
Significand = -Significand;
return Significand;
}
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
uint64_t bits = 0x400D99999999999A;
printf("The bits 0x" PRIx64 " interpreted as:\n", bits);
printf("\ta native double represent %.9999g, and\n",
InterpretNativeDouble(bits));
printf("\tan IEEE-754 double-precision datum represent %.9999g.\n",
InterpretDouble(bits));
}
CodePudding user response:
As an IEEE-754 double-precision value, that bit pattern 400D99999999999A
actually consists of three parts:
- the first bit, 0, is the sign;
- the next 11 bits,
10000000000
or0x400
, are the exponent; and - the remaining 52 bits,
0xD99999999999A
, are the significand (also known as the "mantissa").
But the exponent has a bias of 1023 (0x3ff
), so numerically it's 0x400
- 0x3ff
= 1. And the significand is all fractional, and has an implicit 1 bit to its left, so it's really 0x1.D99999999999A
.
So the actual number this represents is
0x1.D99999999999A × 2¹
which is about 1.85 × 2, or 3.7.
Or, using C's "hex float" or %a
representation, it's 0x1.D99999999999Ap1
.
In "hex float" notation, the leading 1
and the decimal point (really a "radix point") are explicit, and the p
at the end indicates a power-of-two exponent.
Although the decomposition I've shown here may seem reasonably straightforward, actually writing code to reliably decompose a 64-bit number like 400D99999999999A
into its three component parts, and manipulate and recombine them to determine what floating-point value they represent (or even to form an equivalent hex float constant like 0x1.D99999999999Ap1
) can be surprisingly tricky. See Eric Postpischil's answer for more of the details.