I have a dataframe
:
date C P
0 15.4.21 0.06 0.94
1 16.4.21 0.15 1.32
2 2.5.21 0.06 1.17
3 8.5.21 0.20 0.82
4 9.6.21 0.04 -5.09
5 1.2.22 0.05 7.09
I need to create 2 columns where I Sum both C
and P
for each month.
So new df
will have 2 columns, for example for the month 4
(April) (0.06 0.94 0.15 1.32) = 2.47, so new df
:
4/21 5/21 6/21 2/22
0 2.47 2.25 .. ..
Columns names and order doesn't matter, actualy a string month name even better(April 22).
I was playing with something like this, which is not what i need:
df[['C','P']].groupby(df['date'].dt.to_period('M')).sum()
CodePudding user response:
You almost had it, you need to convert first to_datetime
:
out = (df[['C','P']]
.groupby(pd.to_datetime(df['date'], day_first=True)
.dt.to_period('M'))
.sum()
)
Output:
C P
date
2021-02 0.06 1.17
2021-04 0.21 2.26
2021-08 0.20 0.82
2021-09 0.04 -5.09
2022-01 0.05 7.09
If you want the grand total, sum
again:
out = (df[['C','P']]
.groupby(pd.to_datetime(df['date']).dt.to_period('M'))
.sum().sum(axis=1)
)
Output:
date
2021-02 1.23
2021-04 2.47
2021-08 1.02
2021-09 -5.05
2022-01 7.14
Freq: M, dtype: float64
as "Month year"
If you want a string, better convert it in the end to keep the order:
out.index = out.index.strftime('%B %y')
Output:
date
February 21 1.23
April 21 2.47
August 21 1.02
September 21 -5.05
January 22 7.14
dtype: float64