Home > OS >  assign not working in grouped pandas dataframe
assign not working in grouped pandas dataframe

Time:05-31

In a complex chained method using pandas, one of the steps is grouping data by a column and then calculate some metrics. This is a simplified example of the procedure i want to achieve. I have many more assignments in the workflow but is failing miserabily at first.

import pandas as pd
import numpy as np

data = pd.DataFrame({'Group':['A','A','A','B','B','B'],'first':[1,12,4,5,4,3],'last':[5,3,4,5,2,7,]})

data.groupby('Group').assign(average_ratio=lambda x: np.mean(x['first']/x['last']))


>>>> AttributeError: 'DataFrameGroupBy' object has no attribute 'assign'

I know i could use apply this way:

data.groupby('Group').apply(lambda x: np.mean(x['first']/x['last']))
Group
A    1.733333
B    1.142857
dtype: float64

or much better, renaming the column in the same step:

data.groupby('Group').apply(lambda x: pd.Series({'average_ratio':np.mean(x['first']/x['last'])}))

average_ratio
Group   
A   1.733333
B   1.142857

Is there any way of using .assign to obtain the same?

CodePudding user response:

To answer last question, for your needs no you cannot. The method, DataFrame.assign simply adds new columns or replace existing columns but return the same index DataFrame with new/adjusted columns.

You are attempted a grouped aggregation that reduces the rows to group level and thereby changing the index and DataFrame granularity from unit level to aggregated grouped level. Therefore you need to run your groupby operations without assign.

To encapsulate multiple assigned aggregated columns that aligns to chained process, use a defined method and then apply it accordingly:

def aggfunc(row): 
    row['first_mean'] = np.mean(row['first']) 
    row['last_mean'] = np.mean(row['last']) 
    row['average_ratio'] = np.mean(row['first'].div(row['last'])) 

    return row


agg_data = data.groupby('Group').apply(aggfunc)
  • Related