Assume I have the following dataframe:
Time Flag1
0 0
10 0
30 0
50 1
70 1
90 0
110 0
My goal is to identify if within any window that time is less than lets the number in the row plus 35, then if any element of flag is 1 then that row would be 1. For example consider the above example:
The first element of time is 0 then 0 35 = 35 then in the window of values less than 35 (which is Time =0, 10, 30) all the flag1 values are 0 therefore the first row will be assigned to 0 and so on. Then the next window will be 10 35 = 45 and still will include (0,10,30) and the flag is still 0. So the complete output is:
Time Flag1 Output
0 0 0
10 0 0
30 0 1
50 1 1
70 1 1
90 1 1
110 1 1
To implement this type of problem, I thought I can use two for loops like this:
Output = []
for ii in range(Data.shape[0]):
count =0
th = Data.loc[ii,'Time'] 35
for jj in range(ii,Data.shape[0]):
if (Data.loc[jj,'Time'] < th and Data.loc[jj,'Flag1'] == 1):
count = 1
break
output.append(count)
However this looks tedious. since the inner for loop should go for continue for the entire length of data. Also I am not sure if this method checks the boundary cases for out of bound index when we are reaching to end of the dataframe. I appreciate if someone can comment on something easier than this. This is like a sliding window operation only comparing number to a threshold.
Edit: I do not want to compare two consecutive rows only. I want if for example 30 35 = 65 then as long as time is less than 65 then if flag1 is 1 then output is 1.
The second example:
Time Flag1 Output
0 0 0
30 0 1
40 0 1
60 1 1
90 1 1
140 1 1
200 1 1
350 1 1
CodePudding user response:
Assuming a window k
rows before and k
rows after as mentioned in my comment:
import pandas as pd
Data = pd.DataFrame([[0,0], [10,0], [30,0], [50,1], [70,1], [90,1], [110,1]],
columns=['Time', 'Flag1'])
k = 1 # size of window: up to k rows before and up to k rows after
n = len(Data)
output = [0]*n
for i in range(n):
th = Data['Time'][i] 35
j0 = max(0, i - k)
j1 = min(i k 1, n) # the 1 is because range is non-inclusive of end
output[i] = int(any((Data['Time'][j0 : j1] < th) & (Data['Flag1'][j0 : j1] > 0)))
Data['output'] = output
print(Data)
gives the same output as the original example. And you can change the size of the window my modifying k
.
Of course, if the idea is to check any row afterward, then just use j1 = n
in my example.
CodePudding user response:
import pandas as pd
Data = pd.DataFrame([[0,0],[10,0],[30,0],[50,1],[70,1],[90,1],[110,1]],columns=['Time','Flag1'])
output = Data.index.map(lambda x: 1 if any((Data.Time[x 1:]<Data.Time[x] 35)*(Data.Flag1[x 1:]==1)) else 0).values
output[-1] = Data.Flag1.values[-1]
Data['output'] = output
print(Data)
# show
Time Flag1 output
0 0 0
30 0 1
40 0 1
50 1 1
70 1 1
90 1 1
110 1 1