Home > database >  Sum all columns by month?
Sum all columns by month?

Time:11-22

I have a dataframe:

    date           C      P
0   15.4.21       0.06   0.94
1   16.4.21       0.15   1.32
2   2.5.21        0.06   1.17
3   8.5.21        0.20   0.82
4   9.6.21        0.04  -5.09
5   1.2.22        0.05   7.09

I need to create 2 columns where I Sum both C and P for each month.

So new df will have 2 columns, for example for the month 4 (April) (0.06 0.94 0.15 1.32) = 2.47, so new df:

     4/21    5/21    6/21    2/22  

0    2.47    2.25    ..      ..

Columns names and order doesn't matter, actualy a string month name even better(April 22).

I was playing with something like this, which is not what i need:

df[['C','P']].groupby(df['date'].dt.to_period('M')).sum()

CodePudding user response:

You almost had it, you need to convert first to_datetime:

out = (df[['C','P']]
       .groupby(pd.to_datetime(df['date'], day_first=True)
                  .dt.to_period('M'))
       .sum()
      )

Output:

            C     P
date               
2021-02  0.06  1.17
2021-04  0.21  2.26
2021-08  0.20  0.82
2021-09  0.04 -5.09
2022-01  0.05  7.09

If you want the grand total, sum again:

out = (df[['C','P']]
       .groupby(pd.to_datetime(df['date']).dt.to_period('M'))
       .sum().sum(axis=1)
       )

Output:

date
2021-02    1.23
2021-04    2.47
2021-08    1.02
2021-09   -5.05
2022-01    7.14
Freq: M, dtype: float64

as "Month year"

If you want a string, better convert it in the end to keep the order:

out.index = out.index.strftime('%B %y')

Output:

date
February 21     1.23
April 21        2.47
August 21       1.02
September 21   -5.05
January 22      7.14
dtype: float64
  • Related