Float max/min is
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
Compiling to assembly I see the literal is 0xffefffffffffffff. I am unable to understand how to write it in a float literal form. I tried -0xFFFFFFFFFFFFFp972 which resulted in 0xFFEFFFFFFFFFFFFE. Notice the last digit is E instead of F. I have no idea why the last bit is wrong or why 972 gave me the closest number. I didn't understand what I should be doing with the exponent bias either. I used 13 F's because that would set 52bits (the amount of bits in the mantissa) but everything else I'm clueless on
I want to be able to write double min/max as a literal and be able to understand it enough so I can parse it into a 8byte hex value
CodePudding user response:
How do I write
float
max as a float literal?
Use FLT_MAX
. If making your own code, use exponential notation either as hex (preferred) or decimal. If in decimal, use FLT_DECIMAL_DIG
significant digits. Any more is not informative. Append an f
.
#include <float.h>
#include <stdio.h>
int main(void) {
printf("%a\n", FLT_MAX);
printf("%.*g\n", FLT_DECIMAL_DIG, FLT_MAX);
float m0 = FLT_MAX;
float m1 = 0x1.fffffep 127f;
float m2 = 3.40282347e 38f;
printf("%d %d\n", m1 == m0, m2 == m0);
}
Sample output
0x1.fffffep 127
3.40282347e 38
1 1
Likewise for double
, yet no f
.
printf("%a\n", DBL_MAX);
printf("%.*g\n", DBL_DECIMAL_DIG, DBL_MAX);
0x1.fffffffffffffp 1023
1.7976931348623157e 308
double m0 = FLT_MAX;
double m1 = 0x1.fffffffffffffp 1023;
double m2 = 1.7976931348623157e 308;
Rare machines will have different max values.
CodePudding user response:
#include <float.h> // or possible math.h or values.h
float f = FLT_MAX; // or MAXFLOAT
double d = DBL_MAX; // or MAXDOUBLE
}
and then run that through the pre-processor:
$ cpp 1.c | grep 'float f\|double d'
float f = 3.40282346638528859811704183484516925e 38F;
double d = ((double)1.79769313486231570814527423731704357e 308L);