I saw that it is possible to do groupby and then agg to let pandas produce a new dataframe that groups the old dataframe by the fields you specified, and then aggregate the fields you specified, on some function (sum in the example below).
However, when I wrote the following:
# initialize list of lists
data = [['tom', 10, 100], ['tom', 15, 200], ['nick', 15, 150], ['juli', 14, 140]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'salary'])
# trying to groupby and agg
grouping_vars = ['Name']
nlg_study_grouped = df(grouping_vars,axis = 0).agg({'Name': sum}).reset_index()
Name | Age | salary |
---|---|---|
tom | 10 | 100 |
tom | 15 | 200 |
nick | 15 | 150 |
juli | 14 | 140 |
I am expecting the output to look like this (because it is grouping by Name
then summing the field salary
:
Name | salary |
---|---|
tom | 300 |
nick | 150 |
juli | 140 |
The code works in someone else's example, but my toy example is producing this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-6fb9c0ade242> in <module>
1 grouping_vars = ['Name']
2
----> 3 nlg_study_grouped = df(grouping_vars,axis = 0).agg({'Name': sum}).reset_index()
TypeError: 'DataFrame' object is not callable
I wonder if I missed something dumb.
CodePudding user response:
You can try this
print(df.groupby('Name').sum()['salary'])
To use multiple functions
print(df.groupby(['Name'])['salary']
.agg([('average','mean'),('total','sum'),('product','prod')])
.reset_index())
If you want to group by multiple columns, then you can try adding multiple column names within groupby list
Ex: df.groupby(['Name','AnotherColumn'])...
Further, you can refer this question Aggregation in Pandas