I'm trying to yield a stats for several some kind of "bins". Namely, how many students getting grade 0, how many students getting grade that is greater than 0 and less than 60 ...
I'm not sure if they are bins as they are not equally segmented.
grade == 0
0 < grade < 60
60 <= grade < 70
...
Here is the code
grade_list = [87.5, 87.5, 65.0, 90.0, 72.5, 65.0, 0.0, 65.0, 72.5, 65.0, 72.5, 65.0, 90.0, 90.0, 87.5, 87.5, 87.5, 65.0, 87.5, 65.0, 65.0, 90.0, 99.0, 65.0, 87.5, 65.0, 87.5, 90.0, 87.5, 90.0, 90.0, 0.0, 90.0, 99.0, 65.0, 87.5, 72.5, 72.5, 90.0, 0.0, 65.0, 72.5, 90.0, 90.0, 65.0, 90.0, 90.0, 65.0, 65.0, 0.0, 90.0, 90.0, 100.0, 99.0, 65.0, 90.0, 90.0, 0.0, 99.0, 90.0, 100.0, 87.5, 65.0, 99.0, 0.0, 90.0, 65.0, 90.0, 65.0, 99.0, 90.0, 65.0, 100.0, 65.0, 90.0, 99.0]
print(len(df[df['grade']==0]))
print(len(df[(df['grade']>0)&(df['grade']<60)]))
print(len(df[(df['grade']>=60)&(df['grade']<70)]))
print(len(df[(df['grade']>=70)&(df['grade']<80)]))
print(len(df[(df['grade']>=80)&(df['grade']<90)]))
print(len(df[(df['grade']>=90)]))
I got what I want. The code seems ugly though. Is there a better way to do the job?
CodePudding user response:
IIUC, you can use pandas.cut
, tweaking it a bit to handle the 0 as separate group:
df = pd.DataFrame({'grade': grade_list})
bins = [0,60,70,80,90]
labels = [f'≥{x}' if x>0 else f'>{x}' for x in bins]
df['bin'] = pd.cut(df['grade'].replace(0, -1),
bins=[float('-inf')] bins [float('inf')],
labels=['0'] labels,
right=False)
output (added two points for the example):
grade bin
0 87.5 ≥80
1 87.5 ≥80
2 65.0 ≥60
3 90.0 ≥90
4 72.5 ≥70
.. ... ...
73 65.0 ≥60
74 90.0 ≥90
75 99.0 ≥90
76 10.0 >0
77 0.0 0
[78 rows x 2 columns]
CodePudding user response:
Try this
df['category'] = (df['grade']/10).astype(int)
#This bit converts categories between 0 and 6 into 1. So the categories you now have are 0, 1, 6, 7.., 10
df['category'] = np.where((df.category > 0) & (df.category < 6), 1, df.category)
for i in range(max(df.category) 1):
if len(df[df['category']==i]) > 0:
print(i, len(df[df['category']==i]))
This will give you categories like the values you want and print out the number of rows in those categories. The if statement is just to avoid blank rows like you did in your snippet, but can remove it.
Output-
The dataframe-
grade category
0 87.5 8
1 87.5 8
2 65.0 6
3 90.0 9
4 72.5 7
.. ... ...
71 65.0 6
72 100.0 10
73 65.0 6
74 90.0 9
75 99.0 9
Sizes of each bin-
0 6
6 21
7 6
8 11
9 29
10 3