How do I write float max in float literal form and parse it?-CodePudding

Float max/min is

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

Compiling to assembly I see the literal is 0xffefffffffffffff. I am unable to understand how to write it in a float literal form. I tried -0xFFFFFFFFFFFFFp972 which resulted in 0xFFEFFFFFFFFFFFFE. Notice the last digit is E instead of F. I have no idea why the last bit is wrong or why 972 gave me the closest number. I didn't understand what I should be doing with the exponent bias either. I used 13 F's because that would set 52bits (the amount of bits in the mantissa) but everything else I'm clueless on

I want to be able to write double min/max as a literal and be able to understand it enough so I can parse it into a 8byte hex value

CodePudding user response：

How do I write float max as a float literal?

Use FLT_MAX. If making your own code, use exponential notation either as hex (preferred) or decimal. If in decimal, use FLT_DECIMAL_DIG significant digits. Any more is not informative. Append an f.

#include <float.h>
#include <stdio.h>

int main(void) {
  printf("%a\n", FLT_MAX);
  printf("%.*g\n", FLT_DECIMAL_DIG, FLT_MAX);
  float m0 = FLT_MAX;
  float m1 = 0x1.fffffep 127f;
  float m2 = 3.40282347e 38f;
  printf("%d %d\n", m1 == m0, m2 == m0);
}

Sample output

0x1.fffffep 127
3.40282347e 38
1 1

Likewise for double, yet no f.

printf("%a\n", DBL_MAX);
printf("%.*g\n", DBL_DECIMAL_DIG, DBL_MAX);

0x1.fffffffffffffp 1023
1.7976931348623157e 308

double m0 = FLT_MAX;
double m1 = 0x1.fffffffffffffp 1023;
double m2 = 1.7976931348623157e 308;

Rare machines will have different max values.

CodePudding user response：

#include <float.h> // or possible math.h or values.h
   float f = FLT_MAX; // or MAXFLOAT
   double d = DBL_MAX; // or MAXDOUBLE
}

and then run that through the pre-processor:

$ cpp 1.c | grep 'float f\|double d'
 float f = 3.40282346638528859811704183484516925e 38F;
 double d = ((double)1.79769313486231570814527423731704357e 308L);