Say I have pandas dataframe with index column having dates. And another column having some values.
Date Value
2021-04-05 101
2021-04-12 200
2021-04-20 15
2021-05-21 12
2021-05-28 11
I want to add a row at the end of each month showing the total for that month:
Date Value
2021-04-05 101
2021-04-12 200
2021-04-20 15
April 2021 316
2021-05-21 12
2021-05-28 11
May 2021 23
How do I do that? I know I will need to use groupby, that's not an issue, I use smth like this:
df.groupby(df['Date'].dt.strftime('%B %Y'))['Values'].sum().sort_values()
But I don't know how to do the rest.
CodePudding user response:
You can keep the last of Date
, then pass the searchsorted
, and use the sort_index
s = df.groupby(df['Date'].dt.strftime('%B %Y')).agg({'Date':'last','Value':'sum'})
index = np.searchsorted(df.Date,s.Date)
s = s.drop(['Date'],axis=1).reset_index()
s.index = index
s = pd.concat([df,s]).sort_index()
Out[729]:
Date Value
0 2021-04-05 00:00:00 101
1 2021-04-12 00:00:00 200
2 2021-04-20 00:00:00 15
2 April 2021 316
3 2021-05-21 00:00:00 12
4 2021-05-28 00:00:00 11
4 May 2021 23
CodePudding user response:
If Date
is sorted, you can use idxmax
:
s = (df.groupby(df['Date'].dt.strftime('%B %Y'))
.agg(idx=("Date", "idxmax"), Value=("Value", "sum"))
.reset_index().set_index("idx"))
print (df.append(s).sort_index())
Date Value
0 2021-04-05 00:00:00 101
1 2021-04-12 00:00:00 200
2 2021-04-20 00:00:00 15
2 April 2021 316
3 2021-05-21 00:00:00 12
4 2021-05-28 00:00:00 11
4 May 2021 23