Pandas group by function to do different methods if index in list-CodePudding

I'm wondering if its possible to create your own groupby function that runs a different method for the output in a single column depending on if the index is in some list or not. For example:

df = pd.DataFrame({'ID' : [1, 1, 1, 2, 2, 2, 2, 3, 3, 3],
                   'Data' : [5, 7, 6, 13, 14, 11, 10, 2, 4, 3]})

some_list = [2, 3]

I want to group by ID column, and return an average of the Data column (df.groupby('ID').mean() for most values) However, if ID is in some_list then I would like the average to be calculated as the sum of Data divided by 4 (df.groupby('ID').sum()/4). The output for the above would look as below:

   ID   Data
0   1   6.00
1   2  12.00
2   3   2.25

I know I could do both methods separately and join into one column after doing the groupby, but I was wondering if its possible to do this in one step? Maybe with df.groupby('ID').apply(function)?

I've looked at this question, but it didn't help me.

CodePudding user response：

Try groupby with apply and a condition:

df.groupby('ID', as_index=False)['Data'].apply(lambda x: x.sum() / 4 if x.name in some_list else x.mean())

Output:

   ID   Data
0   1   6.00
1   2  12.00
2   3   2.25

CodePudding user response：

If performance is important dont use groupby.apply, you can filter rows for aggregate sum with division and for aggregate mean:

s = df[df['ID'].isin(some_list)].groupby('ID')['Data'].sum().div(4)
df = s.combine_first(df.groupby('ID')['Data'].mean()).reset_index()

print (df)
   ID   Data
0   1   6.00
1   2  12.00
2   3   2.25

Here is alternative solution:

df = df.groupby('ID')['Data'].agg(['sum','mean']).reset_index()
df['Value'] = np.where(df['ID'].isin(some_list), df.pop('sum').div(4), df.pop('mean'))
print (df)
   ID  Value
0   1   6.00
1   2  12.00
2   3   2.25