Home > Software engineering >  How to group by the last date in a month in pandas
How to group by the last date in a month in pandas

Time:09-28

I have a df

qty         val         date
10          100         2021-09-15
11          111         2021-09-22
20          222         2021-09-22
35          333         2021-09-15
45          444         2021-10-15
79          555         2021-10-15
79          555         2021-10-07
87          666         2021-11-15
45          777         2021-11-15
97          888         2021-11-22

I am trying to group the values by the max(date) of each month, so I would only take the values of 2021-09-22 / 2021-10-15 / 2021-11-22 because these dates are the highest of each month. I would expect to get the sum of qty & val for those dates during groupby. Or just group by date and then select only the values of the last date of date, but I assume that removing unwanted data first is more efficient.

I tried using:

group = df.groupby('date')
group_max = groups.apply(lambda g: g[g['date'] == g['date'].max()])

But with no luck, how would I filter before or after grouping so that I would only get the values of the last dates of each month in a date?

CodePudding user response:

You can use pandas.Grouper:

# pre-requisite
df['date'] = pd.to_datetime(df['date'])

g = df.groupby(pd.Grouper(freq='M', key='date'))
df[df['date'].eq(g['date'].transform('max'))]

output:

   qty  val       date
1   11  111 2021-09-22
2   20  222 2021-09-22
4   45  444 2021-10-15
5   79  555 2021-10-15
9   97  888 2021-11-22
  • Related