Home > database >  Pandas cut and specifying specific bin sizes
Pandas cut and specifying specific bin sizes

Time:03-22

I have just been playing with cut and specifying specific bin sizes but sometimes I was getting incorrect data in my bins.

What I want to do is bin data depending on where it falls in my Risk Impact matrix.

Risk Evaluation Matrix

Below is the original code I used to create my dataFrame and allocate my bins and labels.

risk = {'Gross Risk': {0: 1, 1: 3, 2: 4, 3: 6, 4: 9, 5: 14, 6: 20, 7: 5, 8: 8, 9: 16, 10: 22, 11: 1, 12: 3, 13: 6, 14: 9, 15: 12, 16: 25}}
df = pd.DataFrame.from_dict(risk)

# Create a list of names we will call the bins
group_names = ['Very Low', 'Low', 'Medium', 'High', 'Very High']

# Specify our bin boundaries
evaluation_bins = [1, 4, 8, 12, 20, 25]

# And stitch it all together
df['Risk Evaluation'] = pd.cut(df['Gross Risk'], bins = evaluation_bins, labels = group_names, include_lowest = True)

This creates the following output

Dataframe output after applying bins

Checking this off against my Risk Matrix I can see that rows 6, 7, 8 and 15 are incorrectly segregated.

To fix this I then respecified the Evaluation_Bins data. Instead of taking the lower bound of the bin, I specified the upper bound.

evaluation_bins = [1, 3, 6, 10, 16, 25]

This gives me the results I want. But either I fundamentally misunderstood how to specify the Pandas cut boundaries, I thought I had to specify the lower limit, not the higher, or I have just fluked my way through to the result I wanted.

Correct result

Could someone put my mind at rest and let me know if I miss understood how the creation of fixed width bins worked or not?

Be nice :)

CodePudding user response:

Change evaluation_bins and use right=False as parameter of pd.cut:

evaluation_bins = [1, 4, 8, 12, 20, np.inf]

df['Risk Evaluation2'] = pd.cut(df['Gross Risk'], bins=evaluation_bins, 
                                labels=group_names, include_lowest=True, right=False)
print(df)

# Output
    Gross Risk Risk Evaluation Risk Evaluation2
0            1        Very Low         Very Low
1            3        Very Low         Very Low
2            4        Very Low              Low
3            6             Low              Low
4            9          Medium           Medium
5           14            High             High
6           20            High        Very High
7            5             Low              Low
8            8             Low           Medium
9           16            High             High
10          22       Very High        Very High
11           1        Very Low         Very Low
12           3        Very Low         Very Low
13           6             Low              Low
14           9          Medium           Medium
15          12          Medium             High
16          25       Very High        Very High
  • Related