I have a dataframe df which looks like
time(float) | value (float) |
---|---|
10.45 | 10 |
10.50 | 20 |
10.55 | 25 |
11.20 | 30 |
11.44 | 20 |
12.30 | 30 |
I need help to calculate a new column called rolling_average_value which is basically the average value of that row and all the values 1 hour before that row such that the new dataframe looks like.
time(float) | value (float) | rolling_average_value |
---|---|---|
10.45 | 10 | 10 |
10.50 | 20 | 15 |
10.55 | 25 | 18.33 |
11.20 | 30 | 21.25 |
11.44 | 20 | 21 |
12.30 | 30 | 25 |
Note: This time column is a float column
CodePudding user response:
You can temporarily set a datetime index and apply rolling.mean
:
# extract hours/minuts from float
import numpy as np
minutes, hours = np.modf(df['time(float)'])
hours = hours.astype(int)
minutes = minutes.mul(100).astype(int)
dt = pd.to_datetime(hours.astype(str) minutes.astype(str), format='%H%M')
# perform rolling computation
df['rolling_mean'] = (df.set_axis(dt)
.rolling('1h')['value (float)']
.mean()
.set_axis(df.index)
)
output:
time(float) value (float) rolling_mean
0 10.45 10 10.000000
1 10.50 20 15.000000
2 10.55 25 18.333333
3 11.20 30 21.250000
4 11.44 20 21.000000
5 12.30 30 25.000000
Alternative to compute dt
:
dt = pd.to_datetime(df['time(float)'].astype(str)
.str.replace('\d ', lambda x: x.group().zfill(2),
regex=True),
format='%H.%M')
CodePudding user response:
Assuming your data frame is sorted by time, you can also use a simple list comprehension to solve your problem. Iterate over times
and get all indices where the distance from the previous time values to the actual iteration value is less than one (meaning less than one hour) and slice the value
column that was converted to an array by those indices. Then, you can just compute the mean of the sliced array:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"time": [10.45, 10.5, 10.55, 11.2, 11.44, 12.3],
"value": [10, 20, 25, 30, 20, 30]}
)
times = df["time"].values
values = df["value"].values
df["rolling_mean"] = [round(np.mean(values[np.where(times[i] - times[:i 1] < 1)[0]]), 2) for i in range(len(times))]
If your data frame is large, you can compile this loop in C/C too make it significantly faster:
from numba import njit
@njit
def compute_rolling_mean(times, values):
return [round(np.mean(values[np.where(times[i] - times[:i 1] < 1)[0]]), 2) for i in range(len(times))]
df["rolling_mean"] = compute_rolling_mean(df["time"].values, df["value"].values)
Output:
time value rolling_mean
0 10.45 10 10.00
1 10.50 20 15.00
2 10.55 25 18.33
3 11.20 30 21.25
4 11.44 20 21.00
5 12.30 30 25.00