Count the range of each value in Python-CodePudding

I have dataset of student's scores for each subject.

StuID  Subject Scores                
1      Math    90
1      Geo     80
2      Math    70
2      Geo     60
3      Math    50
3      Geo     90

Now I want to count the range of scores for each subject like 0< x <=20, 20< x <=30 and get a dataframe like this:

Subject  0-20  20-40 40-60 60-80 80-100                 
Math       0     0     1     1     1
Geo        0     0     0     1     2

The given dataset is just a sample of the data I am working on. My dataset has more than 1000 line. How can I do it? Thank you!

CodePudding user response：

Assuming a pandas DataFrame df:

import pandas as pd

bins = list(range(0, 100 1, 20))
# [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels = [f'{a}-{b}' for a,b in zip(bins, bins[1:])]
# ['0-10', '10-20', '20-30', '30-40', '40-50', '50-60', '60-70', '70-80', '80-90', '90-100']

out = (pd.crosstab(df['Subject'], pd.cut(df['Scores'],
                                         bins=bins, labels=labels,
                                         ordered=True, right=False))
          .reindex(labels, axis=1, fill_value=0)
        #  .reset_index().rename_axis(columns=None) # optional
      )

output:

Scores   0-20  20-40  40-60  60-80  80-100
Subject                                   
Geo         0      0      0      1       2
Math        0      0      1      1       1

CodePudding user response：

First, you can create bins:

bins = [0, 19, 39, 59, 79, 100]
df['binned'] = pd.cut(df['Scores'], bins)

>>    StuID Subject  Scores     binned
>> 0      1    Math      90  (79, 100]
>> 1      1     Geo      80  (79, 100]
>> 2      2    Math      70   (59, 79]
>> 3      2     Geo      60   (59, 79]
>> 4      3    Math      50   (39, 59]
>> 5      3     Geo      90  (79, 100]

Then pivot it:

df.pivot_table(index='Subject', columns='binned', values='StuID', aggfunc=pd.Series.count)

>> binned   (39, 59]  (59, 79]  (79, 100]
>> Subject                               
>> Geo           NaN       1.0        2.0
>> Math          1.0       1.0        1.0