Say I have the following dataframe:
import pandas as pd
import numpy as np
data = np.random.randint(1, 10, size=(10,2))
df = pd.DataFrame(data, columns=['x1', 'x2'])
df['switch'] = [1,1,0,0,1,1,0,0,1,1]
index_ = pd.date_range('2022-01-17 13:00:00', periods=10, freq='5s')
df.index = index_.rename('Time')
resulting in:
x1 x2 switch
Time
2022-01-17 13:00:00 2 6 1
2022-01-17 13:00:05 9 8 1
2022-01-17 13:00:10 4 9 0
2022-01-17 13:00:15 5 6 0
2022-01-17 13:00:20 4 9 1
2022-01-17 13:00:25 6 7 1
2022-01-17 13:00:30 4 6 0
2022-01-17 13:00:35 2 3 0
2022-01-17 13:00:40 4 9 1
2022-01-17 13:00:45 5 2 1
I'm looking for a way of getting the start time, end time, and means of x1 & x2 for each block where the switch value is 1.
So here (for example):
Start: 2022-01-17 13:00:00
End: 2022-01-17 13:00:05
x1 mean: 5.5
x2 mean: 7
I have no idea how to both detect the change in the switch column and then also to build the mean from the values occurring between the change and return the times at which the change in the switch column occurred.
CodePudding user response:
You could use groupby
agg
with a custom group:
df2 = df.reset_index()
df2['Time'] = pd.to_datetime(df2['Time'])
(df2[df2['switch'].eq(1)] # keep only rows with switch 1
.groupby(df2['switch'].ne(1).cumsum()) # group by consecutive 1s
.agg({'x1': 'mean', 'x2': 'mean', 'Time': ('min', 'max')})
)
output:
x1 x2 Time
mean mean min max
switch
0 7.5 4.0 2022-01-17 13:00:00 2022-01-17 13:00:05
2 3.0 4.5 2022-01-17 13:00:20 2022-01-17 13:00:25
4 6.0 3.5 2022-01-17 13:00:40 2022-01-17 13:00:45
alternative output:
df2 = df.reset_index()
df2['Time'] = pd.to_datetime(df2['Time'])
(df2[df2['switch'].eq(1)]
.groupby(df2['switch'].ne(1).cumsum())
.agg(avg_x1=('x1', 'mean'),
avg_x2=('x2', 'mean'),
start=('Time', 'min'),
end=('Time', 'max'))
.reset_index(drop=True)
)
output:
avg_x1 avg_x2 start end
0 7.5 4.0 2022-01-17 13:00:00 2022-01-17 13:00:05
1 3.0 4.5 2022-01-17 13:00:20 2022-01-17 13:00:25
2 6.0 3.5 2022-01-17 13:00:40 2022-01-17 13:00:45