I have just been playing with cut and specifying specific bin sizes but sometimes I was getting incorrect data in my bins.
What I want to do is bin data depending on where it falls in my Risk Impact matrix.
Below is the original code I used to create my dataFrame and allocate my bins and labels.
risk = {'Gross Risk': {0: 1, 1: 3, 2: 4, 3: 6, 4: 9, 5: 14, 6: 20, 7: 5, 8: 8, 9: 16, 10: 22, 11: 1, 12: 3, 13: 6, 14: 9, 15: 12, 16: 25}}
df = pd.DataFrame.from_dict(risk)
# Create a list of names we will call the bins
group_names = ['Very Low', 'Low', 'Medium', 'High', 'Very High']
# Specify our bin boundaries
evaluation_bins = [1, 4, 8, 12, 20, 25]
# And stitch it all together
df['Risk Evaluation'] = pd.cut(df['Gross Risk'], bins = evaluation_bins, labels = group_names, include_lowest = True)
This creates the following output
Checking this off against my Risk Matrix I can see that rows 6, 7, 8 and 15 are incorrectly segregated.
To fix this I then respecified the Evaluation_Bins data. Instead of taking the lower bound of the bin, I specified the upper bound.
evaluation_bins = [1, 3, 6, 10, 16, 25]
This gives me the results I want. But either I fundamentally misunderstood how to specify the Pandas cut boundaries, I thought I had to specify the lower limit, not the higher, or I have just fluked my way through to the result I wanted.
Could someone put my mind at rest and let me know if I miss understood how the creation of fixed width bins worked or not?
Be nice :)
CodePudding user response:
Change evaluation_bins
and use right=False
as parameter of pd.cut
:
evaluation_bins = [1, 4, 8, 12, 20, np.inf]
df['Risk Evaluation2'] = pd.cut(df['Gross Risk'], bins=evaluation_bins,
labels=group_names, include_lowest=True, right=False)
print(df)
# Output
Gross Risk Risk Evaluation Risk Evaluation2
0 1 Very Low Very Low
1 3 Very Low Very Low
2 4 Very Low Low
3 6 Low Low
4 9 Medium Medium
5 14 High High
6 20 High Very High
7 5 Low Low
8 8 Low Medium
9 16 High High
10 22 Very High Very High
11 1 Very Low Very Low
12 3 Very Low Very Low
13 6 Low Low
14 9 Medium Medium
15 12 Medium High
16 25 Very High Very High