I have a data frame and i am creating bins with pd.qcut as following:
us_counties['bins'] = pd.qcut(us_counties['economic connectedness'], q=10,precision=2)
The bins are:
us_counties.bins.cat.categories
IntervalIndex([(0.27999999999999997, 0.58], (0.58, 0.67], (0.67, 0.72], (0.72, 0.76], (0.76, 0.81], (0.81, 0.85], (0.85, 0.9], (0.9, 0.97], (0.97, 1.06], (1.06, 1.36]], dtype='interval[float64, right]')
I want to change their format so the first bin is <0.58, the medium ones 0.67-0.72 and the last one >1.06.
I managed to make the format of the medium ones with the following command:
us_counties.bins.cat.categories = [f'{i.left} - {i.right}' for i in us_counties.bins.cat.categories]
How can I change the first and last one, so that I end with bins that look like:
['<0.58','0.58 - 0.67',....,'0.97 - 1.06','>1.06']
CodePudding user response:
You probably cannot do it with qcut
on the reason that qcut
distributes data points into equal-sized buckets. In your way, this equality will not be maintained. For example, 0.58 is inclusive by qcut
but you want it to be exclusive. So, what you can do is create a function and another column, and assign a bin to each row using apply
function. Then, based on the new column, you can do whatever you wish to do in the next round. I may suggest this if I understand your case correctly.
CodePudding user response:
How about something like this?
mybinlabels = [f'{i.left} - {i.right}' for i in us_counties.bins.cat.categories]
mybinlabels[0] = ["<" str(i.right) for i in [us_counties.bins.cat.categories[0]]]
mybinlabels[-1] = [">" str(i.left) for i in [us_counties.bins.cat.categories[-1]]]
us_counties.bins.cat.categories = mybinlabels