Home > Mobile >  Pandas group by function to do different methods if index in list
Pandas group by function to do different methods if index in list

Time:09-24

I'm wondering if its possible to create your own groupby function that runs a different method for the output in a single column depending on if the index is in some list or not. For example:

df = pd.DataFrame({'ID' : [1, 1, 1, 2, 2, 2, 2, 3, 3, 3],
                   'Data' : [5, 7, 6, 13, 14, 11, 10, 2, 4, 3]})

some_list = [2, 3]

I want to group by ID column, and return an average of the Data column (df.groupby('ID').mean() for most values) However, if ID is in some_list then I would like the average to be calculated as the sum of Data divided by 4 (df.groupby('ID').sum()/4). The output for the above would look as below:

   ID   Data
0   1   6.00
1   2  12.00
2   3   2.25

I know I could do both methods separately and join into one column after doing the groupby, but I was wondering if its possible to do this in one step? Maybe with df.groupby('ID').apply(function)?

I've looked at this question, but it didn't help me.

CodePudding user response:

Try groupby with apply and a condition:

df.groupby('ID', as_index=False)['Data'].apply(lambda x: x.sum() / 4 if x.name in some_list else x.mean())

Output:

   ID   Data
0   1   6.00
1   2  12.00
2   3   2.25

CodePudding user response:

If performance is important dont use groupby.apply, you can filter rows for aggregate sum with division and for aggregate mean:

s = df[df['ID'].isin(some_list)].groupby('ID')['Data'].sum().div(4)
df = s.combine_first(df.groupby('ID')['Data'].mean()).reset_index()

print (df)
   ID   Data
0   1   6.00
1   2  12.00
2   3   2.25

Here is alternative solution:

df = df.groupby('ID')['Data'].agg(['sum','mean']).reset_index()
df['Value'] = np.where(df['ID'].isin(some_list), df.pop('sum').div(4), df.pop('mean'))
print (df)
   ID  Value
0   1   6.00
1   2  12.00
2   3   2.25
  • Related