Home > Software engineering >  elegant way to agg and transform together in pandas groupby
elegant way to agg and transform together in pandas groupby

Time:12-20

I have a dataframe like as below

df = pd.DataFrame(
    {'stud_id' : [101, 101, 101, 101, 
                  101, 101, 101, 101],
     'sub_code' : ['CSE01', 'CSE01', 'CSE01', 
                   'CSE01', 'CSE02', 'CSE02',
                   'CSE02', 'CSE02'],
     'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016', 
                '13/05/2010',  '10/11/2008','11/1/2007', '27/02/2006'],
     'marks' : [77, 86, 55, 90, 
                65, 90, 80, 67]}
)
df['ques_date'] = pd.to_datetime(df['ques_date'])

I would like to do the below

a) group the data by stud_id and sub_code

b) Compute the mean difference ques_date for each group

c) Compute the count of marks for each group

So, I tried the below and it works fine

df['avg_ques_gap'] = (df.groupby(['stud_id','sub_code'])['ques_date']
                          .transform(lambda x: x.diff().dt.days.median()))

output = df.groupby(['stud_id','sub_code']).agg(last_ques_date=('ques_date','max'),
                                                  total_pos_transactions=('marks','count')).reset_index()

But you can see that I write two lines. one for transform and other for aggregate function.

Is there anyway to write both transform and aggregate in single line?

I expect my ouptut to be like as below

enter image description here

CodePudding user response:

Because there is added median lambda function aggregate values, so possible add to named aggregation:

f = lambda x: x.diff().dt.days.median()
output = (df.groupby(['stud_id','sub_code'], as_index=False)
           .agg(last_ques_date=('ques_date','max'),
                 total_pos_transactions=('marks','count'),
                 avg_ques_gap = ('ques_date',f)))
print (output)
   stud_id sub_code last_ques_date  total_pos_transactions  avg_ques_gap
0      101    CSE01     2020-11-13                       4        -594.0
1      101    CSE02     2010-05-13                       4        -579.0
  • Related