Home > front end >  How can I grab rows with max date for each year by ID from Pandas dataframe?
How can I grab rows with max date for each year by ID from Pandas dataframe?

Time:06-19

sample dataframe looks:

      ID       Date               Value
      2       2020-06-30          124
      1       2020-09-30          265
      1       2021-12-31          140 
      1       2020-12-31          142
      2       2020-12-31          147
      1       2019-12-31          677
      1       2021-03-31          235
      2       2021-09-30          917
      2       2021-03-31          149

I want to grab rows of max date for each year of each ID. 
The final output would be:
      
     ID       Date               Value
      1       2019-12-31          677
      1       2020-12-31          142
      1       2021-12-31          140 
      2       2020-12-31          147
      2       2021-09-30          917

I tried groupby ID but not sure how to grab rows by max date for each year.

Many thanks for your help!

CodePudding user response:

Here is one way to accomplish it

df.assign(yr=df['Date'].astype('datetime64').dt.year).groupby(['ID','yr']).max().reset_index().drop(columns=['yr'])

since a max for each year is needed, a temporary year is created via assign, then grouped by id and year to get the max for each year. Finally dropping the yr column from result

    ID  Date      Value
0   1   2019-12-31  677
1   1   2020-12-31  265
2   1   2021-12-31  235
3   2   2020-12-31  147
4   2   2021-09-30  917

CodePudding user response:

First you would need to extract year from date:

df['year'] = pd.DatetimeIndex(df['Date']).year

then if you want to grab rows with max date in a year, get the max date:

maxDf = df.groupby([year])['Date'].max()

then you can filter you dataframe on the max dates

maxDates = maxDf['Date'].values.tolist()
df.loc[df['Date'].isin(maxDates)]
  • Related