Home > Back-end >  Find max by year and return date on which max occurred in Pandas with dates as index
Find max by year and return date on which max occurred in Pandas with dates as index

Time:11-26

I have this dataframe

date,AA
1980-01-01, 77.7
1980-01-02, 86
1980-01-03, 92.3
1980-01-04, 96.4
1980-01-05, 85.7
1980-01-06, 75.7
1980-01-07, 86.8
1980-01-08, 93.2
1985-08-13, 224.6
1985-08-14, 213.9
1985-08-15, 205.7
1985-08-16, 207.3
1985-08-17, 202.1

I would like to compute the max for each year and the date where it happens. I am struggling because I would like to keep indeed the date as index.

Indeed I read it as:

dfr    = pd.read_csv(fnamed, sep=',', header = 0, index_col=0, parse_dates=True)

I know that I could resample as

dfr_D = dfr.resample('Y').max()

but in this case I would lose the information about the location of the maximum value within the year.

I have found this:

idx = dfr.groupby(lambda x: dfr['date'][x].year)["A"].idxmax()

However, dfr['date'] seems to be the name of the column while in my case date in an index and '.year' is not one of its properties.

I have the felling that I should work with "groupby" and "indexmax". However, all the attends that I made, they all failed.

Thanks in advance

CodePudding user response:

Assuming "date" is of datetime type and a column, you can use the following to slice your data with the maximum per group:

df.loc[df.groupby(df['date'].dt.year)['AA'].idxmax().values]

output:

        date     AA
3 1980-01-04   96.4
8 1985-08-13  224.6

If "date" is the index:

df.loc[df.groupby(df.index.year)['AA'].idxmax().values]

output:

               AA
date             
1980-01-04   96.4
1985-08-13  224.6
  • Related