I have data similar to:
Date | A | B |
---|---|---|
1/5/22 | 4 | 5 |
1/6/22 | 8 | 3 |
1/8/22 | 1 | 5 |
2/10/22 | 4 | 8 |
2/28/22 | 6 | 4 |
3/15/22 | 6 | 0 |
3/21/22 | 0 | 0 |
3/25/22 | 5 | 7 |
I want to retain only rows with the last date of the month, similar to below:
Date | A | B |
---|---|---|
1/8/22 | 1 | 5 |
2/28/22 | 6 | 4 |
3/25/22 | 5 | 7 |
I tried using the following code:
df.resample("M").last()
However, I get the following error: 'DataFrame' object has no attribute 'to_datetime'
But my index is set as Date column which is passed through the datetime function as below:
df['Date'] = df['Date'].apply(lambda x: str(x))
df['Date'] = pd.to_datetime(df['Date']).dt.date
df.set_index('Date', inplace=True)
CodePudding user response:
You can use a monthly period (convert to_datetime
and to_period
) and groupby.idxmax
:
# convert to datetime
date = pd.to_datetime(df['Date'], dayfirst=False)
# get index of last day per monthly period
out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]
output:
Date A B
2 1/8/22 1 5
4 2/28/22 6 4
7 3/25/22 5 7
NB. if you expect more than one "last date" row per month, use:
out = df.loc[date.isin(date.groupby(date.dt.to_period('M')).max())]
if Date
is the index
date = pd.to_datetime(df.index.to_series(), dayfirst=False)
out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]