How do I add a row at the end of each month that show the sum for each month?-CodePudding

Say I have pandas dataframe with index column having dates. And another column having some values.

Date        Value
2021-04-05   101
2021-04-12   200
2021-04-20   15
2021-05-21   12 
2021-05-28   11

I want to add a row at the end of each month showing the total for that month:

Date        Value
2021-04-05   101
2021-04-12   200
2021-04-20   15
April 2021   316
2021-05-21   12 
2021-05-28   11
May 2021     23

How do I do that? I know I will need to use groupby, that's not an issue, I use smth like this:

df.groupby(df['Date'].dt.strftime('%B %Y'))['Values'].sum().sort_values()

But I don't know how to do the rest.

CodePudding user response：

You can keep the last of Date , then pass the searchsorted, and use the sort_index

s = df.groupby(df['Date'].dt.strftime('%B %Y')).agg({'Date':'last','Value':'sum'})
index = np.searchsorted(df.Date,s.Date)
s = s.drop(['Date'],axis=1).reset_index()
s.index = index
s = pd.concat([df,s]).sort_index()
Out[729]: 
                  Date  Value
0  2021-04-05 00:00:00    101
1  2021-04-12 00:00:00    200
2  2021-04-20 00:00:00     15
2           April 2021    316
3  2021-05-21 00:00:00     12
4  2021-05-28 00:00:00     11
4             May 2021     23

CodePudding user response：

If Date is sorted, you can use idxmax:

s = (df.groupby(df['Date'].dt.strftime('%B %Y'))
       .agg(idx=("Date", "idxmax"), Value=("Value", "sum"))
       .reset_index().set_index("idx"))

print (df.append(s).sort_index())

                  Date  Value
0  2021-04-05 00:00:00    101
1  2021-04-12 00:00:00    200
2  2021-04-20 00:00:00     15
2           April 2021    316
3  2021-05-21 00:00:00     12
4  2021-05-28 00:00:00     11
4             May 2021     23