I have a dataset containing hourly temperatures for a year. So, I have 24 entries for each day (temp for every hour) and I want to find out the 5 days with highest temp. I am aware of nlargest() function to find out 5 max values but those values happen to be on a single day only. How do I find out the 5 max values but on different days?
I tried using nlargest() and .loc() but could not find the solution. Please help.
I have attached what the dataset looks like.
CodePudding user response:
You might want to get the max per group with groupby.max
then find the top 5 with nlargest
df.groupby(['year','month','day'])['temp'].max().nlargest(5)
CodePudding user response:
You can use groupby
An example would be
max_5_temps = df.groupby('date_column')['temperature_column'].max().nlargest(5)
CodePudding user response:
Use GroupBy.max
for rows per days by maximal temperature with Series.nlargest
:
df1 = df.groupby(['year','month','day'])['temp'].max().nlargest(5).reset_index(name='top5')
If need also another rows use DataFrameGroupBy.idxmax
with DataFrame.nlargest
:
df2 = df.loc[df.groupby(['year','month','day'])['temp'].idxmax()].nlargest(5, 'temp')