I have this dataframe:
Date Height Month Year
0 1971-01-12 0.672 12 1971
1 1971-01-12 0.332 12 1971
2 1971-01-12 0.122 12 1971
3 1971-01-12 0.002 12 1971
4 1971-01-12 0.302 12 1971
... ... ... ...
1669476 2022-03-20 1.180 3 2022
1669477 2022-03-20 1.200 3 2022
1669478 2022-03-20 1.220 3 2022
1669479 2022-03-20 1.240 3 2022
1669480 2022-03-20 1.260 3 2022
[1669481 rows x 4 columns]
I was wanting to calculate the maximum height value for each month for each year.
I have converted the date column to datetime as follows:
data['Date'] = pd.to_datetime(data['Date'])
And I have done this to get the max height for each year which works:
year_max = data.loc[data.groupby(data['Date'].dt.year)['Height'].idxmax().values]
However when trying to replicate it except for each month of each year, as follows, it gives very weird results:
month_max = data.loc[data.groupby(data['Date'].dt.strftime('%m-%y'))['Height'].idxmax().values]
month_max
Out[172]:
Date Height Month Year
303514 2000-01-21 2.832 1 2000
355067 2001-01-13 2.712 1 2001
405713 2002-01-01 2.862 1 2002
460778 2003-01-19 2.652 1 2003
513769 2004-01-22 2.695 1 2004
... ... ... ...
93617 1995-12-24 2.892 12 1995
143002 1996-12-12 2.652 12 1996
147474 1997-12-01 2.712 1 1997
219299 1998-12-06 2.672 6 1998
276169 1999-12-07 2.892 7 1999
[493 rows x 4 columns]
Any help would be greatly appreciated! :)
CodePudding user response:
use groupby and take the max of the height column
df.groupby(['Year', 'Month'], as_index=False)['Height'].max()
Year Month Height
0 1971 12 0.672
1 2022 3 1.260
OR
take the max height by Year, Month and compare with the height in the DF, using Loc
df.loc[(df.groupby(['Year', 'Month'])['Height'].transform(max)).eq(df['Height'])]
changed DF to Data
data.loc[(data.groupby(['Year', 'Month'])['Height'].transform(max)).eq(data['Height'])]
Date Height Month Year
0 1971-01-12 0.672 12 1971
1669480 2022-03-20 1.260 3 2022
CodePudding user response:
I think to achaive your goal is simpler to groupby multiple columns. In this sense try: df_grouped_month_year = df.groupby(['year','month'])['Height'].max()