I have the following dataframe:
time alarm
0 0
1 1
2 0
3 1
4 1
5 1
6 1
7 0
8 0
9 1
10 0
The column alarm
represents an alarm. If it rings, it takes value 1.
Each time the alarm rings, I want to "silence" the next two rows. Then, if it rings again after the silenced period, I want to silence the next two rows, and so on.
In other words, I want to obtain the following dataframe:
time alarm silenced
0 0 no
1 1 no
2 0 yes
3 1 yes
4 1 no
5 1 yes
6 1 yes
7 0 no
8 0 no
9 1 no
10 0 yes
I managed to do it using a for loop or a lambda function, but I have to speed up the computation.
Can somebody help me? Thank you in advance!
P.S. Since I will eventually remove the "silenced" rows, a solution that directly removes such rows will also be accepted. In such case, the result should be:
time alarm
0 0
1 1
4 1
7 0
8 0
9 1
MY ATTEMPT using a for loop in an auxiliary function:
import numpy as np
import pandas as pd
df = pd.DataFrame({"time":[0,1,2,3,4,5,6,7,8,9,10], "alarm":[0,1,0,1,1,1,1,0,0,1,0]})
df
def fun_silence(df):
# bool: if True, we are in a "silent" period
# if False, we can consider the current time as a possible alarm
flag_silent = False
# time of the *last* alarm
alarm_time = np.nan
# loop over rows
for index, row in df.iterrows():
# if we are in a silent period
if flag_silent:
# if 2 time steps passed from the last alarm, we end the silent period
if row['time'] - alarm_time > 2:
flag_silent = False
# otherwise, we mark this row as silenced
else:
df.at[index, 'silenced'] = 1
# if there is an alarm and we are not in a silent period
if row['alarm'] == 1 and not flag_silent:
# save the alarm time
alarm_time = row['time']
# enter in a silent period
flag_silent = True
return df
df['silenced'] = 0
df_silenced = fun_silence(df)
df_silenced
CodePudding user response:
I think you can not avoid the for-loop in this problem but you can certainly optimize the function and then compile it using numba to achieve C like speed on large datasets
from numba import njit
@njit
def silence(alarm):
count = 0
for a in alarm:
if count > 0:
yield True
count -= 1
elif count == 0 and a == 1:
count = 2
yield False
else:
yield False
df['silenced'] = [*silence(df['alarm'].to_numpy())]
time alarm silenced
0 0 0 False
1 1 1 False
2 2 0 True
3 3 1 True
4 4 1 False
5 5 1 True
6 6 1 True
7 7 0 False
8 8 0 False
9 9 1 False
10 10 0 True