Home > front end >  Sliding Window and comparing elements of DataFrame to a threshold
Sliding Window and comparing elements of DataFrame to a threshold

Time:09-24

Assume I have the following dataframe:

Time  Flag1 
0      0        
10     0
30     0
50     1
70     1
90     0
110    0

My goal is to identify if within any window that time is less than lets the number in the row plus 35, then if any element of flag is 1 then that row would be 1. For example consider the above example:

The first element of time is 0 then 0 35 = 35 then in the window of values less than 35 (which is Time =0, 10, 30) all the flag1 values are 0 therefore the first row will be assigned to 0 and so on. Then the next window will be 10 35 = 45 and still will include (0,10,30) and the flag is still 0. So the complete output is:

Time  Flag1   Output
0      0         0   
10     0         0
30     0         1
50     1         1
70     1         1
90     1         1
110    1         1

To implement this type of problem, I thought I can use two for loops like this:

Output = []
for ii in range(Data.shape[0]):
       count =0
       th = Data.loc[ii,'Time']   35
       for jj in range(ii,Data.shape[0]):
           if (Data.loc[jj,'Time'] < th and Data.loc[jj,'Flag1'] == 1):
                  count = 1
                  break
       output.append(count)

However this looks tedious. since the inner for loop should go for continue for the entire length of data. Also I am not sure if this method checks the boundary cases for out of bound index when we are reaching to end of the dataframe. I appreciate if someone can comment on something easier than this. This is like a sliding window operation only comparing number to a threshold.

Edit: I do not want to compare two consecutive rows only. I want if for example 30 35 = 65 then as long as time is less than 65 then if flag1 is 1 then output is 1.

The second example:

Time  Flag1   Output
0      0         0   
30     0         1
40     0         1
60     1         1
90     1         1
140    1         1
200    1         1
350    1         1

CodePudding user response:

Assuming a window k rows before and k rows after as mentioned in my comment:

import pandas as pd

Data = pd.DataFrame([[0,0], [10,0], [30,0], [50,1], [70,1], [90,1], [110,1]],
                    columns=['Time', 'Flag1'])

k = 1   # size of window: up to k rows before and up to k rows after 
n = len(Data)
output = [0]*n
for i in range(n):
    th = Data['Time'][i]   35
    j0 = max(0, i - k)
    j1 = min(i   k   1, n)  # the  1 is because range is non-inclusive of end
    output[i] = int(any((Data['Time'][j0 : j1] < th) & (Data['Flag1'][j0 : j1] > 0)))
Data['output'] = output

print(Data)

gives the same output as the original example. And you can change the size of the window my modifying k.

Of course, if the idea is to check any row afterward, then just use j1 = n in my example.

CodePudding user response:

import pandas as pd

Data = pd.DataFrame([[0,0],[10,0],[30,0],[50,1],[70,1],[90,1],[110,1]],columns=['Time','Flag1'])

output = Data.index.map(lambda x: 1 if any((Data.Time[x 1:]<Data.Time[x] 35)*(Data.Flag1[x 1:]==1)) else 0).values
output[-1] = Data.Flag1.values[-1]
Data['output'] = output

print(Data)

# show
Time    Flag1   output
0       0       0
30      0       1
40      0       1
50      1       1
70      1       1
90      1       1
110     1       1
  • Related