Home > Software engineering >  Getting the max value and the time the max value occurs for all periods in a pandas df
Getting the max value and the time the max value occurs for all periods in a pandas df

Time:10-10

I have a pandas dataframe which looks like this:

    Concentr 1  Concentr 2     Time
 0    25.4        0.48       00:01:00
 1    26.5        0.49       00:02:00
 2    25.2        0.52       00:03:00
 3    23.7        0.49       00:04:00
 4    23.8        0.55       00:05:00
 5    24.6        0.53       00:06:00
 6    26.3        0.57       00:07:00
 7    27.1        0.59       00:08:00
 8    28.8        0.56       00:09:00
 9    23.9        0.54       00:10:00
 10   25.6        0.49       00:11:00
 11   27.5        0.56       00:12:00
 12   26.3        0.55       00:13:00
 13   25.3        0.54       00:14:00

and I want to keep the max value of Concentr 1 of every 5 minute interval, along with the time it occured and the value of concetr 2 at that time. So, for the previous example I would like to have:

     Concentr 1    Concentr 2     Time
 0     26.5          0.49       00:02:00
 1     28.8          0.56       00:09:00
 2     27.5          0.56       00:12:00

My current approach would be i) to create and auxiliary variable with an ID for each 5-min interval eg 00:00 to 00:05 will be interval 1, from 00:05 to 00:10 would be interval 2 etc, ii) use the interval variable in a groupby to get the max concentr 1 per interval and iii) merge back to the initial df using both the interval variable and the concentr 1 and thus identifying the corresponding time.

I would like to ask if there is a better / more efficient / more elegant way to do it.

Thank you very much for any help.

CodePudding user response:

You can do a regular resample / groupby, and use the idxmax method to get the desired row for each group. Then use that to index your original data:

>>> df.loc[df.resample('5T', on='Time')['Concentr1'].idxmax()]
    Concentr 1  Concentr 2                Time
1         26.5        0.49 2021-10-09 00:02:00
8         28.8        0.56 2021-10-09 00:09:00
11        27.5        0.56 2021-10-09 00:12:00

This is assuming your 'Time' column is datetime like, which I did with pd.to_datetime. You can convert the time column back with strftime. So in full:

df['Time'] = pd.to_datetime(df['Time'])
result = df.loc[df.resample('5T', on='Time')['Concentr1'].idxmax()]
result['Time'] = result['Time'].dt.strftime('%H:%M:%S')

Giving:

    Concentr1  Concentr2      Time
1        26.5       0.49  00:02:00
8        28.8       0.56  00:09:00
11       27.5       0.56  00:12:00

CodePudding user response:

df = df.set_index('Time')
idx = df.resample('5T').agg({'Concentr 1': np.argmax})
df = df.iloc[idx.conc]

Then you would probably need to reset_index() if you do not wish Time to be your index.

CodePudding user response:

You can also use this:

groupby every n=5 nrows and filter the original df based on max index of "Concentr 1"

df = df[df.index.isin(df.groupby(df.index // 5)["Concentr 1"].idxmax())]
print(df)

Output:

    Concentr 1  Concentr 2      Time
1         26.5        0.49  00:02:00
8         28.8        0.56  00:09:00
11        27.5        0.56  00:12:00
  • Related