Home > database >  I want to alter bin ranges format in pandas
I want to alter bin ranges format in pandas

Time:11-05

I have a data frame and i am creating bins with pd.qcut as following:

us_counties['bins'] = pd.qcut(us_counties['economic connectedness'], q=10,precision=2)

The bins are:

us_counties.bins.cat.categories
IntervalIndex([(0.27999999999999997, 0.58], (0.58, 0.67], (0.67, 0.72], (0.72, 0.76], (0.76, 0.81], (0.81, 0.85], (0.85, 0.9], (0.9, 0.97], (0.97, 1.06], (1.06, 1.36]], dtype='interval[float64, right]')

I want to change their format so the first bin is <0.58, the medium ones 0.67-0.72 and the last one >1.06.

I managed to make the format of the medium ones with the following command:

us_counties.bins.cat.categories = [f'{i.left} - {i.right}' for i in us_counties.bins.cat.categories]

How can I change the first and last one, so that I end with bins that look like:

['<0.58','0.58 - 0.67',....,'0.97 - 1.06','>1.06']

CodePudding user response:

You probably cannot do it with qcut on the reason that qcut distributes data points into equal-sized buckets. In your way, this equality will not be maintained. For example, 0.58 is inclusive by qcut but you want it to be exclusive. So, what you can do is create a function and another column, and assign a bin to each row using apply function. Then, based on the new column, you can do whatever you wish to do in the next round. I may suggest this if I understand your case correctly.

CodePudding user response:

How about something like this?

mybinlabels = [f'{i.left} - {i.right}' for i in us_counties.bins.cat.categories]
mybinlabels[0] = ["<" str(i.right) for i in [us_counties.bins.cat.categories[0]]]
mybinlabels[-1] = [">" str(i.left) for i in [us_counties.bins.cat.categories[-1]]]
us_counties.bins.cat.categories = mybinlabels
  • Related