Suppose I have the following DataFrame:
df = pd.DataFrame(
{
'year': [2015,2015,2018,2018,2020],
'total': [100,200,50,150,400],
'tax': [10,20,5,15,40]
}
)
I want to sum up the total and tax columns by year and obtain the size
at the same time.
The following code gives me the sum of the two columns:
df_total_tax = df.groupby('year', as_index=False)
df_total_tax = df_total_tax[['total','tax']].apply(np.sum)
However, I can't figure out how to also include a column for size
at the same time. Must I perform a different groupby
, then use .size()
and then append that column to df_total_tax
? Or is there an easier way?
The end result would look like this:
Thanks
CodePudding user response:
You can specify for each column separately aggregate function in named aggregation:
df = df.groupby('year', as_index=False).agg(total=('total','sum'),
tax=('tax','sum'),
size=('tax', 'size'))
print (df)
year total tax size
0 2015 300 30 2
1 2018 200 20 2
2 2020 400 40 1