How to check if a float is positive denormalized/negative denormalized or not denormalized.
I tried to do:
int is_denorm(float f)
{
unsigned int x = *(int*)&f;
unsigned expMask = (1 << 8) - 1;
expMask = expMask << 23;
//now needs to check if the exp is all zero how can I do it
}
CodePudding user response:
check if a float is positive denormalized/negative denormalized or not denormalized
Note that both C and IEEE-754 use subnormal and not denormal.
#include <math.h>
// 1 subnormal
// -1 -subnormal
// 0 not subnormal
int subnormalness(float x) {
if (fpclassify(x) == FP_SUBNORMAL) {
return signbit(x) ? -1 : 1;
}
return 0;
}
Avoid code like *(int*)&f;
and expMask << 23, ...
. That runs into aliasing concerns, float
encoding issues and size of unsigned
.
Sometimes 0.0 is desired to be classified like sub-normals
int subnormalzeroness(float x) {
switch (fpclassify(x))
case FP_SUBNORMAL: // fall through
case FP_ZERO:
return signbit(x) ? -1 : 1;
}
}
return 0;
}
Code such as below works well too when NANs behave per IEEE-754 and fails <
comparisons, otherwise append a && !isnan(x)
to the return
.
int subnormalzeroness_alt(float x) {
return fabsf(x) < FLT_MIN;
}
CodePudding user response:
Instead of making assumptions about the representation of float
and unsigned int
, including size and endianness and encoding, you should use the fpclassify
macro defined in <math.h>
specifically designed for this purpose:
int is_denorm(float f) {
return fpclassify(f) == FP_SUBNORMAL;
}
Depending on its argument, fpclassify(x)
evaluates to one of the number classification macros:
FP_INFINITE
FP_NAN
FP_NORMAL
FP_SUBNORMAL
FP_ZERO
They represent the mutually exclusive kinds of floating-point values. They expand to integer constant expressions with distinct values. Additional implementation-defined floating-point classifications, with macro definitions beginning with FP_
and an uppercase letter, may also be specified by the implementation.
The signbit
macro can be used to extract the sign of a floating point value (of type float
, double
or long double
). Note that signbit(x)
evaluates to non zero for negative values and non-values, including -0.0
for which x < 0
would evaluate to false.
Note that your approach has some problems even on architectures using IEEE 754 single precision floats
and 32-bit integers with the same endianness:
- to avoid aliasing issues, instead of
unsigned int x = *(int *)&f;
you should writeuint32_t x; memcpy(&x, &f, sizeof x);
- testing the exponent bits is not sufficient to detect subnormal values as the values
0.0
and-0.0
also have all exponent bits set to 0.
Also note that denormalized is not always the same as subnormal: In the IEEE 754 standard, subnormal refers to non zero numbers smaller in magnitude than normal numbers with an implicit 1
mantissa (denormal is not used anymore in the IEEE 754 standard nor in the C Standard). Other floating point standards with multiple representations may have denormalized numbers for other sets of values.