Home > Back-end >  Pandas for each new value in a column, remove the following two rows
Pandas for each new value in a column, remove the following two rows

Time:02-19

I have the following dataframe:

time   alarm
0       0
1       1
2       0
3       1
4       1
5       1
6       1
7       0
8       0
9       1
10      0

The column alarm represents an alarm. If it rings, it takes value 1.
Each time the alarm rings, I want to "silence" the next two rows. Then, if it rings again after the silenced period, I want to silence the next two rows, and so on.

In other words, I want to obtain the following dataframe:

time   alarm    silenced
0       0       no
1       1       no
2       0       yes
3       1       yes
4       1       no
5       1       yes
6       1       yes
7       0       no
8       0       no
9       1       no
10      0       yes

I managed to do it using a for loop or a lambda function, but I have to speed up the computation.
Can somebody help me? Thank you in advance!


P.S. Since I will eventually remove the "silenced" rows, a solution that directly removes such rows will also be accepted. In such case, the result should be:

time   alarm
0       0
1       1
4       1
7       0
8       0
9       1

MY ATTEMPT using a for loop in an auxiliary function:

import numpy as np
import pandas as pd

df = pd.DataFrame({"time":[0,1,2,3,4,5,6,7,8,9,10], "alarm":[0,1,0,1,1,1,1,0,0,1,0]})
df

def fun_silence(df):
    
    # bool: if True,  we are in a "silent" period 
    #       if False, we can consider the current time as a possible alarm
    flag_silent = False
    
    # time of the *last* alarm
    alarm_time = np.nan
    
    # loop over rows
    for index, row in df.iterrows():
        
        # if we are in a silent period
        if flag_silent:
            
            # if 2 time steps passed from the last alarm, we end the silent period
            if row['time'] - alarm_time > 2:
                flag_silent = False
                
            # otherwise, we mark this row as silenced
            else:
                df.at[index, 'silenced'] = 1
          
        # if there is an alarm and we are not in a silent period
        if row['alarm'] == 1 and not flag_silent:
            # save the alarm time
            alarm_time = row['time']
            # enter in a silent period
            flag_silent = True
            
    return df
    
df['silenced'] = 0
df_silenced = fun_silence(df)
df_silenced

CodePudding user response:

I think you can not avoid the for-loop in this problem but you can certainly optimize the function and then compile it using numba to achieve C like speed on large datasets

from numba import njit

@njit
def silence(alarm):
    count = 0
    for a in alarm:
        if count > 0:
            yield True
            count -= 1
        elif count == 0 and a == 1:
            count = 2
            yield False
        else:
            yield False

    
df['silenced'] = [*silence(df['alarm'].to_numpy())]

    time  alarm  silenced
0      0      0     False
1      1      1     False
2      2      0      True
3      3      1      True
4      4      1     False
5      5      1      True
6      6      1      True
7      7      0     False
8      8      0     False
9      9      1     False
10    10      0      True
  • Related