Home > Blockchain >  pandas groupby and agg getting TypeError
pandas groupby and agg getting TypeError

Time:02-16

I saw that it is possible to do groupby and then agg to let pandas produce a new dataframe that groups the old dataframe by the fields you specified, and then aggregate the fields you specified, on some function (sum in the example below).

However, when I wrote the following:

# initialize list of lists
data = [['tom', 10, 100], ['tom', 15, 200], ['nick', 15, 150], ['juli', 14, 140]]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'salary'])

# trying to groupby and agg    
grouping_vars = ['Name']    
nlg_study_grouped = df(grouping_vars,axis = 0).agg({'Name': sum}).reset_index()
Name Age salary
tom 10 100
tom 15 200
nick 15 150
juli 14 140

I am expecting the output to look like this (because it is grouping by Name then summing the field salary:

Name salary
tom 300
nick 150
juli 140

The code works in someone else's example, but my toy example is producing this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-6fb9c0ade242> in <module>
      1 grouping_vars = ['Name']
      2 
----> 3 nlg_study_grouped = df(grouping_vars,axis = 0).agg({'Name': sum}).reset_index()

TypeError: 'DataFrame' object is not callable

I wonder if I missed something dumb.

CodePudding user response:

You can try this

print(df.groupby('Name').sum()['salary'])

To use multiple functions

print(df.groupby(['Name'])['salary']
         .agg([('average','mean'),('total','sum'),('product','prod')])
         .reset_index())

If you want to group by multiple columns, then you can try adding multiple column names within groupby list

Ex: df.groupby(['Name','AnotherColumn'])...

Further, you can refer this question Aggregation in Pandas

  • Related