Home > Enterprise >  Largest value representable by a floating-point type smaller than 1
Largest value representable by a floating-point type smaller than 1

Time:03-08

Is there a way to obtain the greatest value representable by the floating-point type float which is smaller than 1.

I've seen the following definition:

static const double DoubleOneMinusEpsilon = 0x1.fffffffffffffp-1;
static const float FloatOneMinusEpsilon = 0x1.fffffep-1;

But is this really how we should define these values?

According to the Standard, std::numeric_limits<T>::epsilon is the machine epsilon, that is, the difference between 1.0 and the next value representable by the floating-point type T. But that doesn't necessarily mean that defining T(1) - std::numeric_limits<T>::epsilon would be better.

CodePudding user response:

You can use the std::nextafter function, which, despite its name, can retrieve the next representable value that is arithmetically before a given starting point, by using an appropriate "to" argument.

And, indeed, when retrieving the closest value less than 1 for the double type (on Windows, using the clang-cl compiler in Visual Studio 2019), the answer is different from the result of the 1 - ε calculation:

#include <iostream>
#include <iomanip>
#include <cmath>
#include <limits>

int main()
{
    double naft = std::nextafter(1.0, 0.0);
    std::cout << std::fixed << std::setprecision(20);
    std::cout << naft << std::endl;
    double neps = 1.0 - std::numeric_limits<double>::epsilon();
    std::cout << neps << std::endl;
    return 0;
}

Output:

0.99999999999999988898
0.99999999999999977796

Note that, when using analogous techniques to determine the closest value that is greater than 1, then the nextafter(1.0, 10000.) call gives the same value as the 1 ε calculation (1.00000000000000022204), as would be expected from the definition of ε.

CodePudding user response:

This can be calculated without calling a function by using the characteristics of floating-point representation specified in the C standard. Since the epsilon provides the distance between representable numbers just above 1, and radix provides the base used to represent numbers, the distance between representable numbers just below one is epsilon divided by that base:

#include <iostream>
#include <limits>


int main(void)
{
    typedef float Float;

    std::cout << std::hexfloat <<
        1 - std::numeric_limits<Float>::epsilon() / std::numeric_limits<Float>::radix
        << '\n';
}

CodePudding user response:

0.999999940395355224609375 is the largest 32 bit float that is less than 1. The code below demos this:

Mac_3.2.57$cat float2uintTest4.c 
#include <stdio.h>
int main(void){
    union{
        float f;
        unsigned int i;
    } u;
    //u.f=0.9999;
    //printf("as hex: %x\n", u.i); // 0x3f7fffff
    u.i=0x3f800000; // 1.0
    printf("as float:  0.200f\n", u.f);
    u.i=0x3f7fffff; // 1.0-e
          //00111111 01111111 11111111 11111111
          //seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
    printf("as float:  0.200f\n", u.f);

    return(0);
}
Mac_3.2.57$cc float2uintTest4.c 
Mac_3.2.57$./a.out 
as float: 1.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
as float: 0.99999994039535522460937500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

CodePudding user response:

0.99999999999999988897769753748434595763683319091796875 is the largest 64 bit float that is less than 1. The code below demos this:

Mac_3.2.57$cat float2uintTest5.c 
#include <stdio.h>
int main(void){
    union{
        double f;
        unsigned long long i;
    } u;
    u.f = 1.0;
    printf("as float:  0.200f\n", u.f);
    printf("as hex: %llx\n", u.i); //0x3ff0000000000000
//00111111 11110000 00000000 00000000 00000000 00000000 00000000 00000000
//seeeeeee eeeemmm...
    u.i = 0x3fefffffffffffff; //00110000 11101111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111
    printf("as int:  0.200f\n", u.f); //
    return(0);
}
Mac_3.2.57$cc float2uintTest5.c
Mac_3.2.57$./a.out 
as float: 1.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
as hex: 3ff0000000000000
as int: 0.99999999999999988897769753748434595763683319091796875000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Mac_3.2.57$
  • Related