Home > Enterprise >  Largest uint64 which can be accurately represented in a float in C/C
Largest uint64 which can be accurately represented in a float in C/C

Time:09-29

I understand that floating point precision has only so many bits. It comes as no surprise that the following code thinks that (float)(UINT64_MAX) and (float)(UINT64_MAX - 1) are equal. I am trying to write a function which would detect this type of, for a lack of proper term, "conversion overflow". I thought I could somehow use FLT_MAX but that's not correct. What's the right way to do this?

#include <iostream>
#include <cstdint>

int main()
{
  uint64_t x1(UINT64_MAX);
  uint64_t x2(UINT64_MAX - 1);
  float f1(static_cast<float>(x1));
  float f2(static_cast<float>(x2));
  std::cout << f1 << " == " << f2 << " = " << (f1 == f2) << std::endl;
  return 0;
}

CodePudding user response:

Largest uint64 which can be accurately represented in a float
What's the right way to do this?

When FLT_RADIX == 2, we are looking for a uint64_t of the form below where n is the max number of bits encodable in a float value. This is usually 24. See FLT_MANT_DIG from <float.h>.

111...(total of n binary digits)...111000...(64-n bits all zero)...000.
//
//1234561234567890
0xFFFFFF0000000000
// e.g. 
~( (1ull << (64-FLT_MANT_DIG)) - 1)

CodePudding user response:

The following function gives you the highest integer exactly representable in a floating point type such that all smaller positive integers are also exactly representable.

template<typename T>
T max_representable_integer()
{
    return std::pow(T(std::numeric_limits<T>::radix), std::numeric_limits<T>::digits);
}

It does the computation in the floating point as for some the result may not be representable in a uint64_t.

  • Related