Home > Net >  Generating column mode in data panel
Generating column mode in data panel

Time:03-29

I have an unbalanced data panel where each period a student can have a certain level/type of scholarship:

head(df)

ID     student_period         scholarship
   
4567        1              scholarship_level_1
4567        2              scholarship_level_2
4567        3              scholarship_level_2
4567        4              scholarship_level_3
5478        4              scholarship_level_3
5478        5              scholarship_level_3
6758        7              scholarship_level_1
6758        8              scholarship_level_2
6758        9              scholarship_level_2

Basically, I want to create a new variable that plots the statistical mode of the scholarship level for each student ID in this panel. Something like this:

head(df1)

ID     student_period         scholarship            scholarship_mode
   
4567        1              scholarship_level_1      scholarship_level_2
4567        2              scholarship_level_2      scholarship_level_2 
4567        3              scholarship_level_2      scholarship_level_2
4567        4              scholarship_level_3      scholarship_level_2
5478        4              scholarship_level_3      scholarship_level_3
5478        5              scholarship_level_3      scholarship_level_3
6758        7              scholarship_level_1      scholarship_level_2
6758        8              scholarship_level_2      scholarship_level_2
6758        9              scholarship_level_2      scholarship_level_2


Any ideas?

CodePudding user response:

You can use groupby transform and value_counts:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.value_counts().index[0]))

or mode:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.mode().iloc[0]))

output:

     ID  student_period          scholarship     scholarship_mode
0  4567               1  scholarship_level_1  scholarship_level_2
1  4567               2  scholarship_level_2  scholarship_level_2
2  4567               3  scholarship_level_2  scholarship_level_2
3  4567               4  scholarship_level_3  scholarship_level_2
4  5478               4  scholarship_level_3  scholarship_level_3
5  5478               5  scholarship_level_3  scholarship_level_3
6  6758               7  scholarship_level_1  scholarship_level_2
7  6758               8  scholarship_level_2  scholarship_level_2
8  6758               9  scholarship_level_2  scholarship_level_2

NB. be aware that mode/value_counts can have ties, in this case only one value will be used.

  • Related