Home > Software engineering >  Pandas GroupBy: comma separated list of sums
Pandas GroupBy: comma separated list of sums

Time:02-15

I have the below groupby which is summing the Amounts at the "ParentAccount" level. I am trying to show on the same line the details behind that sum. I have the comma separated list of accounts showing next to the amount total but would also like to add in a single column that shows a comma separated sum at the account level.

So for the below code I would have the following float strings in a separate column

ParentAccount 1: 3.75, 1
ParentAccount 2: 14, 10.5

Not sure of the best way to go about doing this. I tried doing a merge of two separate groupbys but think there is probably a better way of doing this.

import pandas as pd

data = {
        'ParentAccount': [1,1,1,2,2,2],
        'Account': ['A', 'A', 'C', 'D', 'D','E'],
        'Amount':  [1.5, 2.25, 1, 4.75, 9.25, 10.50],
        }

df = pd.DataFrame(data)
df_final = df.groupby('ParentAccount').agg({'Amount': 'sum', 'Account': lambda x: ','.join(x.unique()),}).add_suffix('-Net')

print(df_final)

CodePudding user response:

You could groupby "ParentAccount" and "Account" to find the sum; then groupby "ParentAccount" again, and pass an unpacked dictionary to agg to do the things you want: (i) Summing the amount and (ii) joining the unique accounts for each ParentAccount (iii) joining the amounts per account for each ParentAccount:

out = (df
       .groupby(['ParentAccount','Account'])
       .sum()
       .reset_index(level=1)
       .groupby(level=0)
       .agg(**{'Amount-Net': ('Amount','sum'), 
               'Account-Net': ('Account', lambda x: ', '.join(x)) , 
               'Amounts per Account': ('Amount', lambda x: ', '.join(x.astype(str)))}))

Output:

               Amount-Net Account-Net Amounts per Account
ParentAccount                                            
1                    4.75        A, C           3.75, 1.0
2                   24.50        D, E          14.0, 10.5    

CodePudding user response:

Use a double groupby:

out = (
    df.groupby(['ParentAccount', 'Account'], as_index=False)['Amount'].sum()
      .groupby('ParentAccount', as_index=False)
      .agg(**{'Amount-Net': ('Amount', 'sum'),
              'Amount-Detail': ('Amount', lambda x: ','.join(x.astype(str))), 
              'Account-Net': ('Account', ','.join)})
)

Output:

>>> out
   ParentAccount  Amount-Net Amount-Detail Account-Net
0              1        4.75      3.75,1.0         A,C
1              2       24.50     14.0,10.5         D,E
  • Related