Home > Software design >  Finding max value in each month of each year
Finding max value in each month of each year

Time:09-27

I have this dataframe:

              Date  Height  Month  Year
0       1971-01-12   0.672     12  1971
1       1971-01-12   0.332     12  1971
2       1971-01-12   0.122     12  1971
3       1971-01-12   0.002     12  1971
4       1971-01-12   0.302     12  1971
           ...     ...    ...   ...
1669476 2022-03-20   1.180      3  2022
1669477 2022-03-20   1.200      3  2022
1669478 2022-03-20   1.220      3  2022
1669479 2022-03-20   1.240      3  2022
1669480 2022-03-20   1.260      3  2022

[1669481 rows x 4 columns]

I was wanting to calculate the maximum height value for each month for each year.

I have converted the date column to datetime as follows:

data['Date'] = pd.to_datetime(data['Date'])

And I have done this to get the max height for each year which works:

year_max = data.loc[data.groupby(data['Date'].dt.year)['Height'].idxmax().values]

However when trying to replicate it except for each month of each year, as follows, it gives very weird results:

month_max = data.loc[data.groupby(data['Date'].dt.strftime('%m-%y'))['Height'].idxmax().values]

month_max
Out[172]: 
             Date  Height  Month  Year
303514 2000-01-21   2.832      1  2000
355067 2001-01-13   2.712      1  2001
405713 2002-01-01   2.862      1  2002
460778 2003-01-19   2.652      1  2003
513769 2004-01-22   2.695      1  2004
          ...     ...    ...   ...
93617  1995-12-24   2.892     12  1995
143002 1996-12-12   2.652     12  1996
147474 1997-12-01   2.712      1  1997
219299 1998-12-06   2.672      6  1998
276169 1999-12-07   2.892      7  1999

[493 rows x 4 columns]

Any help would be greatly appreciated! :)

CodePudding user response:

use groupby and take the max of the height column

df.groupby(['Year', 'Month'], as_index=False)['Height'].max()
Year    Month   Height
0   1971    12  0.672
1   2022    3   1.260

OR

take the max height by Year, Month and compare with the height in the DF, using Loc

df.loc[(df.groupby(['Year', 'Month'])['Height'].transform(max)).eq(df['Height'])] 

changed DF to Data

data.loc[(data.groupby(['Year', 'Month'])['Height'].transform(max)).eq(data['Height'])] 
                  Date Height Month Year
0           1971-01-12  0.672   12  1971
1669480     2022-03-20  1.260   3   2022

CodePudding user response:

I think to achaive your goal is simpler to groupby multiple columns. In this sense try: df_grouped_month_year = df.groupby(['year','month'])['Height'].max()

  • Related