I have a df
qty val date
10 100 2021-09-15
11 111 2021-09-22
20 222 2021-09-22
35 333 2021-09-15
45 444 2021-10-15
79 555 2021-10-15
79 555 2021-10-07
87 666 2021-11-15
45 777 2021-11-15
97 888 2021-11-22
I am trying to group the values by the max(date)
of each month, so I would only take the values of 2021-09-22 / 2021-10-15 / 2021-11-22
because these dates are the highest of each month. I would expect to get the sum of qty & val
for those dates during groupby
. Or just group by date
and then select only the values of the last date of date
, but I assume that removing unwanted data first is more efficient.
I tried using:
group = df.groupby('date')
group_max = groups.apply(lambda g: g[g['date'] == g['date'].max()])
But with no luck, how would I filter before or after grouping so that I would only get the values of the last dates of each month in a date
?
CodePudding user response:
You can use pandas.Grouper
:
# pre-requisite
df['date'] = pd.to_datetime(df['date'])
g = df.groupby(pd.Grouper(freq='M', key='date'))
df[df['date'].eq(g['date'].transform('max'))]
output:
qty val date
1 11 111 2021-09-22
2 20 222 2021-09-22
4 45 444 2021-10-15
5 79 555 2021-10-15
9 97 888 2021-11-22