sample dataframe looks:
ID Date Value
2 2020-06-30 124
1 2020-09-30 265
1 2021-12-31 140
1 2020-12-31 142
2 2020-12-31 147
1 2019-12-31 677
1 2021-03-31 235
2 2021-09-30 917
2 2021-03-31 149
I want to grab rows of max date for each year of each ID.
The final output would be:
ID Date Value
1 2019-12-31 677
1 2020-12-31 142
1 2021-12-31 140
2 2020-12-31 147
2 2021-09-30 917
I tried groupby ID but not sure how to grab rows by max date for each year.
Many thanks for your help!
CodePudding user response:
Here is one way to accomplish it
df.assign(yr=df['Date'].astype('datetime64').dt.year).groupby(['ID','yr']).max().reset_index().drop(columns=['yr'])
since a max for each year is needed, a temporary year is created via assign, then grouped by id and year to get the max for each year. Finally dropping the yr column from result
ID Date Value
0 1 2019-12-31 677
1 1 2020-12-31 265
2 1 2021-12-31 235
3 2 2020-12-31 147
4 2 2021-09-30 917
CodePudding user response:
First you would need to extract year from date:
df['year'] = pd.DatetimeIndex(df['Date']).year
then if you want to grab rows with max date in a year, get the max date:
maxDf = df.groupby([year])['Date'].max()
then you can filter you dataframe on the max dates
maxDates = maxDf['Date'].values.tolist()
df.loc[df['Date'].isin(maxDates)]