Home > front end >  Resampling data to show only data as of the last date of the month
Resampling data to show only data as of the last date of the month

Time:10-05

I have data similar to:

Date A B
1/5/22 4 5
1/6/22 8 3
1/8/22 1 5
2/10/22 4 8
2/28/22 6 4
3/15/22 6 0
3/21/22 0 0
3/25/22 5 7

I want to retain only rows with the last date of the month, similar to below:

Date A B
1/8/22 1 5
2/28/22 6 4
3/25/22 5 7

I tried using the following code:

df.resample("M").last()

However, I get the following error: 'DataFrame' object has no attribute 'to_datetime'

But my index is set as Date column which is passed through the datetime function as below:

df['Date'] = df['Date'].apply(lambda x: str(x))
df['Date'] = pd.to_datetime(df['Date']).dt.date
df.set_index('Date', inplace=True)

CodePudding user response:

You can use a monthly period (convert to_datetime and to_period) and groupby.idxmax:

# convert to datetime
date = pd.to_datetime(df['Date'], dayfirst=False)

# get index of last day per monthly period
out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]

output:

      Date  A  B
2   1/8/22  1  5
4  2/28/22  6  4
7  3/25/22  5  7

NB. if you expect more than one "last date" row per month, use:

out = df.loc[date.isin(date.groupby(date.dt.to_period('M')).max())]

if Date is the index

date = pd.to_datetime(df.index.to_series(), dayfirst=False)

out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]
  • Related