Home > OS >  Calculating mean grades for deciles within a dataset with Python, grouped by another field
Calculating mean grades for deciles within a dataset with Python, grouped by another field

Time:02-11

import pandas as pd  
import csv



df_orig = pd.read_csv('test_sample.csv')  
df_orig = df_orig[(df_orig['number']>0)]  
decile_stats = df_orig.groupby(pd.qcut(df_orig.number, 5))['number'].mean()  


print(decile_stats)

I'm trying to use python to calculate statistics for deciles of my dataset. I can calculate the mean of each decile using qcut, but I want to group my numbers by the values in a second column. This way the deciles are calculated and reported on values according to their value in the family column.

family number
0 1000 0.04
1 1000 0.20
2 1000 0.04
3 1000 0.16
4 1000 0.08
5 1000 0.02
6 1000 0.02
7 1000 0.02
8 1000 0.64
9 1000 0.04

My desired output would be:

Q1 1000 0.028617
Q2 1000 0.105060
Q3 1000 0.452467
Q4 1000 2.644886
Q5 1000 141.749797...

etc. with each 'family' shown, 1000, 2000, 3000 in this case.

CodePudding user response:

IIUC, you can use:

N = 3
labels = [f'Q{i}' for i in range(1, N 1)]
decile = lambda x: x.groupby(pd.qcut(x['number'], N, labels=labels)).mean()

out = df.groupby('family').apply(decile)['number'].rename('mean').reset_index()

Output:

>>> out
   family number      mean
0    1000     Q1  0.030000
1    1000     Q2  0.080000
2    1000     Q3  0.333333
  • Related