I am perplexed as to why my pd.cut function gave me the starting interval that is a negative value. The column that I have cut on, has a minimum value of 0. Hence, I expect my pd.cut function to throw out my first interval to be (0,18) instead of (-0.18,18).
I have changed the precision setting to be 0. However, that just makes my starting interval to be (-0.0,18).
And why is my intervals all in float when the column I've parsed into my pd.cut function is in integers?
Would appreciate all help. Thank you.
CodePudding user response:
As explained in the comments, you asked cut
to define the bins automatically for you, by default they are equal width, which mean having a negative bound is possible.
If you wish to keep the automatic binning, you can modify the intervals manually afterwards. Here is an example in case of only the first interval that is "incorrect", using cat.rename_categories
:
np.random.seed(0)
s = pd.Series(np.random.randint(-10,100,size=100)).clip(lower=0)
s_cut = pd.cut(s, bins=10)
print(s_cut.cat.categories)
first_I = s_cut.cat.categories[0]
new_I = pd.Interval(0, first_I.right)
s_cut = s_cut.cat.rename_categories({first_I: new_I})
print(s_cut.cat.categories)
output:
# before
IntervalIndex([(-0.095, 9.5], (9.5, 19.0], (19.0, 28.5], ...)
# after
IntervalIndex([(0.0, 9.5], (9.5, 19.0], (19.0, 28.5], ...)