Converting from np.float64 to np.float32 completely changes the value of some numbers-CodePudding

I have a numpy array of dtype=float64, when attempting to convert it the types to float 32, some values change completely. for example, i have the following array:

`test_64 = np.array([20110927.00000,20110928.00000,20110929.00000,20110930.00000,20111003.00000,20111004.00000,20111005.00000,20111006.00000,20111007.00000,20111010.00000,20111011.00000,20111012.00000,20111013.00000,20111014.00000,20111017.00000,20111018.00000,20111019.00000,20111020.00000,20111021.00000,20111024.00000,20111025.00000,20111026.00000,20111027.00000,20111028.00000,20111031.00000,20111101.00000,20111102.00000,20111103.00000,20111104.00000,20111107.00000,20111108.00000,20111109.00000,20111110.00000,20111111.00000,20111114.00000,20111115.00000,20111116.00000,20111117.00000,20111118.00000,20111121.00000,20111122.00000,20111123.00000,20111125.00000,20111128.00000,20111129.00000,20111130.00000,20111201.00000,20111202.00000,20111205.00000,20111206.00000,20111207.00000,20111208.00000,20111209.00000,20111212.00000,20111213.00000,20111214.00000,20111215.00000,20111216.00000,20111219.00000,20111220.00000,20111221.00000,20111222.00000,20111223.00000,20111227.00000,20111228.00000,20111229.00000,20111230.00000,20120103.00000,20120104.00000,20120105.00000,20120106.00000,20120109.00000,20120110.00000,20120111.00000,20120112.00000,20120113.00000,20120117.00000,20120118.00000,20120119.00000,20120120.00000,20120123.00000,20120124.00000,20120125.00000,20120126.00000,20120127.00000,20120130.00000,20120131.00000,20120201.00000,20120202.00000,20120203.00000,20120206.00000,20120207.00000,20120208.00000,20120209.00000,20120210.00000,20120213.00000,20120214.00000,20120215.00000,20120216.00000,20120217.00000])

test_32 = np.array(test_64, dtype=np.float32)`

this would change the values of 20110927.00000 to 20110928.00000

even attempting:

np.float32(test_64[0])

would result to changing the value to of 20110927.00000 to 20110928.00000

same thing happening when using cupy arrays

CodePudding user response：

Well, yes, that is what float32 are.

Shortest way to see it, float32 have 24 bits significand (1 bit of sign, and 8 bits of exponents). That is 33 bits in all. But the 1st significand bit is not stored, because it is assumed to be 1.

np.log2(20110927.)
# 24.2614762474699

So, see the problem. You would need 25 bits to be able to have a unit precision on this number. Since you haven't, well, 20110927 and 20110928 are roughly the same from float32 point of view.

Longest answer, encode 20110927 in FP32, and then encode 20110928.

20110927 is 1.1987046599388123 × 2²⁴

So exponent is 24. That is, 24 127=151 in the FP32 format

Then forgetting the 1st one, that is implicit (since exponent was chosen such as it starts with this 1.), the 23 significand bits

s=1.1987046599388123  # Implicit   1
s=s%1*2               # 0.3974... →0
s=s%1*2               # 0.7948... →0
s=s%1*2               # 1.5896... →1
s=s%1*2               # 1.1793... →1
s=s%1*2               # 0.3585... →0
s=s%1*2               # 0.7171... →0
s=s%1*2               # 1.4342... →1
s=s%1*2               # 0.8684... →0
s=s%1*2               # 1.7368... →1
s=s%1*2               # 1.4736... →1
s=s%1*2               # 0.9471... →0
s=s%1*2               # 1.8943... →1
s=s%1*2               # 1.7886... →1
s=s%1*2               # 1.5771... →1
s=s%1*2               # 1.1543... →1
s=s%1*2               # 0.3086... →0
s=s%1*2               # 0.6172... →0
s=s%1*2               # 1.2344... →1
s=s%1*2               # 0.4688... →0
s=s%1*2               # 0.9375... →0
s=s%1*2               # 1.8750... →1
s=s%1*2               # 1.7500... →1
s=s%1*2               # 1.5000... →1

(s%1 is the fractional part of a float. 1.51%1 is 0.51)

I compute it that way, starting from 20110927/2²⁴, since that is what is encoded in base 2. But in reality, what that is, is just the binary encoding of 20110927 24 most significant bits.

bin(20119827)
# 1001100101101111001001111

Note that those are the same bits, but for the last 1, since there are 25 bits, and we need only 24. Including the implicit 1.

And because the next bit is 1, or because the last s of my algorithm on floats is 1.5, it is rounded to the next. So in the end, what is encoded is 100110010110111100101000

(I precise this rounding thing for accuracy, to get an exact result. But that is not the reason of your problem. It it was not rounded up, all that would have changed is that, instead of having 20110927=20110928, you would have had 20110927=20110926. But anyway, 24 bits are not enough to distinguish two consecutive base 10 numbers greater than 16777216. Anyway, it is not a sure thing. Sometimes, .5 get rounded down)

Ignoring the first one, and adding the sign (0) and exponent (24 127=151 aka 1001011)

The float32 representation of 20110927.0 is 01001011100110010110111100101000

Do the same for 20110928.0... and you get the exact same result.

So, in float32, 20110927.0 and 20110928.0 (and 20110927.5, ...) are the same thing.

Another way to check that without the theory on how to encode float32 is

import struct
bin(struct.unpack('i', struct.pack('f', 20110927))[0])
# 0b1001011100110010110111100101000
bin(struct.unpack('i', struct.pack('f', 20110928))[0])
# 0b1001011100110010110111100101000

Or to see a bigger picture

import struct
for i in range(20110901, 20110931):
    print(i, bin(struct.unpack('i', struct.pack('f', i))[0]))

20110901 0b1001011100110010110111100011010
20110902 0b1001011100110010110111100011011
20110903 0b1001011100110010110111100011100
20110904 0b1001011100110010110111100011100
20110905 0b1001011100110010110111100011100
20110906 0b1001011100110010110111100011101
20110907 0b1001011100110010110111100011110
20110908 0b1001011100110010110111100011110
20110909 0b1001011100110010110111100011110
20110910 0b1001011100110010110111100011111
20110911 0b1001011100110010110111100100000
20110912 0b1001011100110010110111100100000
20110913 0b1001011100110010110111100100000
20110914 0b1001011100110010110111100100001
20110915 0b1001011100110010110111100100010
20110916 0b1001011100110010110111100100010
20110917 0b1001011100110010110111100100010
20110918 0b1001011100110010110111100100011
20110919 0b1001011100110010110111100100100
20110920 0b1001011100110010110111100100100
20110921 0b1001011100110010110111100100100
20110922 0b1001011100110010110111100100101
20110923 0b1001011100110010110111100100110
20110924 0b1001011100110010110111100100110
20110925 0b1001011100110010110111100100110
20110926 0b1001011100110010110111100100111
20110927 0b1001011100110010110111100101000
20110928 0b1001011100110010110111100101000
20110929 0b1001011100110010110111100101000
20110930 0b1001011100110010110111100101001

Note that half of the times .5 is rounded up, half of the times rounded down. Leading to this 3/1 pattern. 20110919=20110920=20110921, 20110922 is unique, 20110923=20110924=20110925, 20110926 is unique, 20119727=20110928=20110929. ...

But, the important point is that there are less possible float32 that there are of possible 8 digits base 10 numbers in this range.