Priniting 16bit minimal float looks not consistent?-CodePudding

Can someone explain why printing float16 minimal produces different results below? Is it by design or a bug?

    In [87]: x=np.finfo(np.float16).min
    
    In [88]: x_array_single=np.array([x])
    
    In [89]: x
    Out[89]: -65500.0
    
    In [90]: x_array_single
    Out[90]: array([-65504.], dtype=float16)

CodePudding user response：

This happen because

the default float-printing strategy is to only print the leading digits necessary to represent the value unambiguously, and then pad with zeros. This way only the "significant" digits are shown. It turns out that '65500' is the "shortest" way or representing "65504" since the 4 is unnecessary:

>>> np.float16('65500') == np.float16('65504')`
True

more details

You would also get this issue if you printed the first value of the array:

>>> x_array[0]
-65500.0

You could print the full number if you converted it to float:

>>> float(x)
-65504.0

>>> float(x_array[0])
-65504.0

CodePudding user response：

Internal representation, in fp16 of -65500 is bytes 255 and 251. See

import struct
struct.unpack('BB', struct.pack('e', -65500))
# (255, 251)

And so is internal representation of -65504

import struct
struct.unpack('BB', struct.pack('e', -65504))
# (255, 251)

In binary, and taking into accound my machine is little endian (so, it shoud be read 251 then 255) that is

1 11110 1111111111

Which is sign -, exponent 30-15=15, and then 1(implicit) 1/2 1/4 ... /1024 ten bits =

- (2**15) * (1.0   sum(1/2**k for k in range(1,11)))

-65504 (just to state, floating point representation is an exact science ;-))

And the next possible number with this exponent is therefore

1 11110 1111111110

Whose value is

struct.unpack('e', b'\xfe\xfb')

-65472

Mid point between -65504 and -65472 is -65488. And you can see that indeed, all number smaller than -65488 share the same fp16 representation as -65504. Where as all bigger do not.

struct.unpack('BB', struct.pack('e', -65488.01))
# (255,251)
struct.unpack('BB', struct.pack('e', -65488))
# (254, 251)

Or to use nokla's method (whose answer appeared while I was typing this one)

np.float16(-65488.01)==np.float16(-65504)
# True
np.float16(-65488)==np.float16(-65504)
# False

As for your initial question (I realize both nokla and I answered to "why it is not an error", or "why is it possible", but not really to "why is it so"), well, I guess some display (and only display. From the value point of view, all that is the same thing) favor the "roundest" decimal representation when it has a choice among many equivalent decimal representation, whereas other favor the most central value: -65500.0 is the roundest decimal value among all values represented by bytes (255,251). Whereas -65504 is the mean of all values represented by those bytes.