Home > OS >  numpy.histogram not behaving as expected - not using half-open intervals?
numpy.histogram not behaving as expected - not using half-open intervals?

Time:04-01

Here's a very short bit of code...

import numpy as np
test = [0.4, 0.5, 0.6, 0.6, 0.0, 0.3, 0.5, 0.5, 0.8, 0.4]
np.histogram(test, bins=np.arange(0, 1   0.1, 0.1))

...and here's the output, where the first array is the histogram data and the second array gives the bin edges:

(array([1, 0, 1, 0, 2, 5, 0, 0, 1, 0]),
 array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]))

I'm confused by the outputted histogram data. According to the documentation, the bins are all half-intervals of the form [a, b), except the last one which is [a, b]. However, this doesn't line up with the histogram data. For example, the 5th element is 5, which supposedly corresponds to the bin [0.5, 0.6), but there are only three numbers in this interval! Am I going mad?

CodePudding user response:

This is a common thing that catches people by surprise. The histogram function does work as advertised, but the numbers are not quite as you would expect... The right edge of the 5th bucket is actually slightly offset, as the following returns False:

np.arange(0, 1   0.1, 0.1)[6] == 0.6

For more details see: Is floating point math broken?

  • Related